Overview

Brought to you by YData

Dataset statistics

Number of variables61
Number of observations2430
Missing cells1875
Missing cells (%)1.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.1 MiB
Average record size in memory496.0 B

Variable types

Categorical52
Numeric4
Text3
Boolean2

Alerts

AJCC ID (2018+) is highly overall correlated with Derived EOD 2018 M Recode (2018+) and 6 other fieldsHigh correlation
COD to site rec KM is highly overall correlated with COD to site recode and 5 other fieldsHigh correlation
COD to site recode is highly overall correlated with COD to site rec KM and 5 other fieldsHigh correlation
COD to site recode ICD-O-3 2023 Revision is highly overall correlated with COD to site rec KM and 5 other fieldsHigh correlation
COD to site recode ICD-O-3 2023 Revision Expanded (1999+) is highly overall correlated with COD to site rec KM and 5 other fieldsHigh correlation
Chemotherapy recode (yes, no/unk) is highly overall correlated with Derived EOD 2018 Stage Group Recode (2018+) and 3 other fieldsHigh correlation
Derived EOD 2018 M Recode (2018+) is highly overall correlated with AJCC ID (2018+) and 8 other fieldsHigh correlation
Derived EOD 2018 N Recode (2018+) is highly overall correlated with AJCC ID (2018+) and 7 other fieldsHigh correlation
Derived EOD 2018 Stage Group Recode (2018+) is highly overall correlated with AJCC ID (2018+) and 5 other fieldsHigh correlation
Derived EOD 2018 T Recode (2018+) is highly overall correlated with AJCC ID (2018+) and 6 other fieldsHigh correlation
Derived Summary Grade 2018 (2018+) is highly overall correlated with Grade Clinical (2018+) and 1 other fieldsHigh correlation
EOD Mets Recode (2018+) is highly overall correlated with Derived EOD 2018 M Recode (2018+) and 2 other fieldsHigh correlation
EOD Primary Tumor Recode (2018+) is highly overall correlated with Derived EOD 2018 T Recode (2018+)High correlation
EOD Regional Nodes Recode (2018+) is highly overall correlated with Derived EOD 2018 N Recode (2018+)High correlation
First malignant primary indicator is highly overall correlated with Record number recode and 2 other fieldsHigh correlation
Grade Clinical (2018+) is highly overall correlated with Derived Summary Grade 2018 (2018+)High correlation
Grade Pathological (2018+) is highly overall correlated with Derived Summary Grade 2018 (2018+)High correlation
Median household income inflation adj to 2023 is highly overall correlated with Patient IDHigh correlation
Mets at DX-Distant LN (2016+) is highly overall correlated with Mets at DX-Other (2016+) and 4 other fieldsHigh correlation
Mets at DX-Other (2016+) is highly overall correlated with Mets at DX-Distant LN (2016+) and 4 other fieldsHigh correlation
PRCDA 2020 is highly overall correlated with Patient IDHigh correlation
Patient ID is highly overall correlated with Median household income inflation adj to 2023 and 1 other fieldsHigh correlation
Primary Site is highly overall correlated with AJCC ID (2018+) and 4 other fieldsHigh correlation
Primary Site - labeled is highly overall correlated with AJCC ID (2018+) and 4 other fieldsHigh correlation
RX Summ--Scope Reg LN Sur (2003+) is highly overall correlated with Reason no cancer-directed surgery and 1 other fieldsHigh correlation
RX Summ--Surg Prim Site (1998+) is highly overall correlated with RX Summ--Surg/Rad Seq and 1 other fieldsHigh correlation
RX Summ--Surg/Rad Seq is highly overall correlated with RX Summ--Surg Prim Site (1998+)High correlation
RX Summ--Systemic/Sur Seq (2007+) is highly overall correlated with Chemotherapy recode (yes, no/unk)High correlation
Reason no cancer-directed surgery is highly overall correlated with RX Summ--Scope Reg LN Sur (2003+) and 1 other fieldsHigh correlation
Record number recode is highly overall correlated with First malignant primary indicator and 2 other fieldsHigh correlation
Regional nodes examined (1988+) is highly overall correlated with RX Summ--Scope Reg LN Sur (2003+) and 1 other fieldsHigh correlation
Regional nodes positive (1988+) is highly overall correlated with Regional nodes examined (1988+)High correlation
SEER Combined Mets at DX-bone (2010+) is highly overall correlated with Mets at DX-Distant LN (2016+) and 4 other fieldsHigh correlation
SEER Combined Mets at DX-brain (2010+) is highly overall correlated with Mets at DX-Distant LN (2016+) and 4 other fieldsHigh correlation
SEER Combined Mets at DX-liver (2010+) is highly overall correlated with Derived EOD 2018 M Recode (2018+) and 6 other fieldsHigh correlation
SEER Combined Mets at DX-lung (2010+) is highly overall correlated with Mets at DX-Distant LN (2016+) and 4 other fieldsHigh correlation
SEER cause-specific death classification is highly overall correlated with COD to site rec KM and 5 other fieldsHigh correlation
SEER other cause of death classification is highly overall correlated with COD to site rec KM and 5 other fieldsHigh correlation
Sequence number is highly overall correlated with First malignant primary indicator and 2 other fieldsHigh correlation
Site recode ICD-O-3 2023 Revision Expanded is highly overall correlated with AJCC ID (2018+) and 4 other fieldsHigh correlation
Survival months flag is highly overall correlated with Type of Reporting SourceHigh correlation
Total number of in situ/malignant tumors for patient is highly overall correlated with First malignant primary indicator and 2 other fieldsHigh correlation
Tumor Size Summary (2016+) is highly overall correlated with Chemotherapy recode (yes, no/unk) and 1 other fieldsHigh correlation
Type of Reporting Source is highly overall correlated with Survival months flagHigh correlation
Vital status recode (study cutoff used) is highly overall correlated with COD to site rec KM and 6 other fieldsHigh correlation
Year of follow-up recode is highly overall correlated with Vital status recode (study cutoff used)High correlation
Site recode ICD-O-3 2023 Revision Expanded is highly imbalanced (62.7%) Imbalance
Grade Clinical (2018+) is highly imbalanced (70.8%) Imbalance
Diagnostic Confirmation is highly imbalanced (87.9%) Imbalance
Derived EOD 2018 N Recode (2018+) is highly imbalanced (72.9%) Imbalance
Derived EOD 2018 M Recode (2018+) is highly imbalanced (55.2%) Imbalance
RX Summ--Surg Oth Reg/Dis (2003+) is highly imbalanced (84.5%) Imbalance
RX Summ--Surg/Rad Seq is highly imbalanced (98.8%) Imbalance
Reason no cancer-directed surgery is highly imbalanced (60.5%) Imbalance
Radiation recode is highly imbalanced (96.8%) Imbalance
RX Summ--Systemic/Sur Seq (2007+) is highly imbalanced (59.2%) Imbalance
EOD Primary Tumor Recode (2018+) is highly imbalanced (51.8%) Imbalance
EOD Regional Nodes Recode (2018+) is highly imbalanced (68.2%) Imbalance
EOD Mets Recode (2018+) is highly imbalanced (66.3%) Imbalance
Regional nodes examined (1988+) is highly imbalanced (67.5%) Imbalance
Regional nodes positive (1988+) is highly imbalanced (67.9%) Imbalance
SEER Combined Mets at DX-bone (2010+) is highly imbalanced (92.2%) Imbalance
SEER Combined Mets at DX-brain (2010+) is highly imbalanced (93.0%) Imbalance
SEER Combined Mets at DX-liver (2010+) is highly imbalanced (71.1%) Imbalance
SEER Combined Mets at DX-lung (2010+) is highly imbalanced (92.2%) Imbalance
Mets at DX-Distant LN (2016+) is highly imbalanced (91.7%) Imbalance
Mets at DX-Other (2016+) is highly imbalanced (78.5%) Imbalance
COD to site recode is highly imbalanced (82.1%) Imbalance
SEER cause-specific death classification is highly imbalanced (79.7%) Imbalance
SEER other cause of death classification is highly imbalanced (78.8%) Imbalance
Survival months flag is highly imbalanced (92.6%) Imbalance
COD to site rec KM is highly imbalanced (82.1%) Imbalance
COD to site recode ICD-O-3 2023 Revision is highly imbalanced (81.9%) Imbalance
COD to site recode ICD-O-3 2023 Revision Expanded (1999+) is highly imbalanced (82.0%) Imbalance
Vital status recode (study cutoff used) is highly imbalanced (50.0%) Imbalance
Sequence number is highly imbalanced (53.0%) Imbalance
Primary by international rules is highly imbalanced (96.5%) Imbalance
Record number recode is highly imbalanced (58.2%) Imbalance
Total number of in situ/malignant tumors for patient is highly imbalanced (53.9%) Imbalance
Total number of benign/borderline tumors for patient is highly imbalanced (93.7%) Imbalance
Year of follow-up recode is highly imbalanced (74.4%) Imbalance
Type of Reporting Source is highly imbalanced (80.7%) Imbalance
RX Summ--Scope Reg LN Sur (2003+) has 1875 (77.2%) missing values Missing

Reproduction

Analysis started2025-07-24 19:16:54.658752
Analysis finished2025-07-24 19:17:12.486206
Duration17.83 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
White
1472 
Black
465 
Other (American Indian/AK Native, Asian/Pacific Islander)
443 
Unknown
 
50

Length

Max length57
Median length5
Mean length14.520988
Min length5

Characters and Unicode

Total characters35286
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWhite
2nd rowBlack
3rd rowBlack
4th rowWhite
5th rowWhite

Common Values

ValueCountFrequency (%)
White 1472
60.6%
Black 465
 
19.1%
Other (American Indian/AK Native, Asian/Pacific Islander) 443
 
18.2%
Unknown 50
 
2.1%

Length

2025-07-24T16:17:12.621842image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:12.724597image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
white 1472
31.7%
black 465
 
10.0%
other 443
 
9.5%
american 443
 
9.5%
indian/ak 443
 
9.5%
native 443
 
9.5%
asian/pacific 443
 
9.5%
islander 443
 
9.5%
unknown 50
 
1.1%

Most occurring characters

ValueCountFrequency (%)
i 4130
 
11.7%
e 3244
 
9.2%
a 3123
 
8.9%
n 2365
 
6.7%
t 2358
 
6.7%
2215
 
6.3%
h 1915
 
5.4%
c 1794
 
5.1%
W 1472
 
4.2%
A 1329
 
3.8%
Other values (21) 11341
32.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 35286
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 4130
 
11.7%
e 3244
 
9.2%
a 3123
 
8.9%
n 2365
 
6.7%
t 2358
 
6.7%
2215
 
6.3%
h 1915
 
5.4%
c 1794
 
5.1%
W 1472
 
4.2%
A 1329
 
3.8%
Other values (21) 11341
32.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 35286
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 4130
 
11.7%
e 3244
 
9.2%
a 3123
 
8.9%
n 2365
 
6.7%
t 2358
 
6.7%
2215
 
6.3%
h 1915
 
5.4%
c 1794
 
5.1%
W 1472
 
4.2%
A 1329
 
3.8%
Other values (21) 11341
32.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 35286
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 4130
 
11.7%
e 3244
 
9.2%
a 3123
 
8.9%
n 2365
 
6.7%
t 2358
 
6.7%
2215
 
6.3%
h 1915
 
5.4%
c 1794
 
5.1%
W 1472
 
4.2%
A 1329
 
3.8%
Other values (21) 11341
32.1%

Sex
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Female
1265 
Male
1165 

Length

Max length6
Median length6
Mean length5.0411523
Min length4

Characters and Unicode

Total characters12250
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female 1265
52.1%
Male 1165
47.9%

Length

2025-07-24T16:17:12.857242image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:12.960936image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
female 1265
52.1%
male 1165
47.9%

Most occurring characters

ValueCountFrequency (%)
e 3695
30.2%
a 2430
19.8%
l 2430
19.8%
F 1265
 
10.3%
m 1265
 
10.3%
M 1165
 
9.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12250
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 3695
30.2%
a 2430
19.8%
l 2430
19.8%
F 1265
 
10.3%
m 1265
 
10.3%
M 1165
 
9.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12250
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 3695
30.2%
a 2430
19.8%
l 2430
19.8%
F 1265
 
10.3%
m 1265
 
10.3%
M 1165
 
9.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12250
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 3695
30.2%
a 2430
19.8%
l 2430
19.8%
F 1265
 
10.3%
m 1265
 
10.3%
M 1165
 
9.5%
Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
2022
782 
2021
745 
2019
331 
2020
312 
2018
260 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters9720
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022
2nd row2020
3rd row2018
4th row2019
5th row2021

Common Values

ValueCountFrequency (%)
2022 782
32.2%
2021 745
30.7%
2019 331
13.6%
2020 312
 
12.8%
2018 260
 
10.7%

Length

2025-07-24T16:17:13.058202image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:13.159698image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 782
32.2%
2021 745
30.7%
2019 331
13.6%
2020 312
 
12.8%
2018 260
 
10.7%

Most occurring characters

ValueCountFrequency (%)
2 5051
52.0%
0 2742
28.2%
1 1336
 
13.7%
9 331
 
3.4%
8 260
 
2.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 9720
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 5051
52.0%
0 2742
28.2%
1 1336
 
13.7%
9 331
 
3.4%
8 260
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 9720
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 5051
52.0%
0 2742
28.2%
1 1336
 
13.7%
9 331
 
3.4%
8 260
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 9720
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 5051
52.0%
0 2742
28.2%
1 1336
 
13.7%
9 331
 
3.4%
8 260
 
2.7%

PRCDA 2020
Categorical

High correlation 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Not PRCDA
1363 
PRCDA
1067 

Length

Max length9
Median length9
Mean length7.2436214
Min length5

Characters and Unicode

Total characters17602
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot PRCDA
2nd rowNot PRCDA
3rd rowNot PRCDA
4th rowNot PRCDA
5th rowNot PRCDA

Common Values

ValueCountFrequency (%)
Not PRCDA 1363
56.1%
PRCDA 1067
43.9%

Length

2025-07-24T16:17:13.299295image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:13.406667image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
prcda 2430
64.1%
not 1363
35.9%

Most occurring characters

ValueCountFrequency (%)
P 2430
13.8%
A 2430
13.8%
D 2430
13.8%
C 2430
13.8%
R 2430
13.8%
N 1363
7.7%
1363
7.7%
o 1363
7.7%
t 1363
7.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 17602
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
P 2430
13.8%
A 2430
13.8%
D 2430
13.8%
C 2430
13.8%
R 2430
13.8%
N 1363
7.7%
1363
7.7%
o 1363
7.7%
t 1363
7.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 17602
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
P 2430
13.8%
A 2430
13.8%
D 2430
13.8%
C 2430
13.8%
R 2430
13.8%
N 1363
7.7%
1363
7.7%
o 1363
7.7%
t 1363
7.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 17602
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
P 2430
13.8%
A 2430
13.8%
D 2430
13.8%
C 2430
13.8%
R 2430
13.8%
N 1363
7.7%
1363
7.7%
o 1363
7.7%
t 1363
7.7%

Site recode ICD-O-3 2023 Revision Expanded
Categorical

High correlation  Imbalance 

Distinct14
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Stomach
1651 
Small Intestine
535 
Colon And Rectum (Excluding Appendix)
 
98
Digestive Other
 
76
Retroperitoneum And Peritoneum
 
29
Other values (9)
 
41

Length

Max length37
Median length7
Mean length10.618519
Min length5

Characters and Unicode

Total characters25803
Distinct characters38
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.2%

Sample

1st rowStomach
2nd rowSmall Intestine
3rd rowSmall Intestine
4th rowStomach
5th rowStomach

Common Values

ValueCountFrequency (%)
Stomach 1651
67.9%
Small Intestine 535
 
22.0%
Colon And Rectum (Excluding Appendix) 98
 
4.0%
Digestive Other 76
 
3.1%
Retroperitoneum And Peritoneum 29
 
1.2%
Esophagus 13
 
0.5%
Miscellaneous Neoplasms 12
 
0.5%
Appendix 5
 
0.2%
Soft Tissue 5
 
0.2%
Pancreas 2
 
0.1%
Other values (4) 4
 
0.2%

Length

2025-07-24T16:17:13.522356image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
stomach 1651
46.9%
small 535
 
15.2%
intestine 535
 
15.2%
and 130
 
3.7%
appendix 103
 
2.9%
colon 98
 
2.8%
rectum 98
 
2.8%
excluding 98
 
2.8%
digestive 76
 
2.2%
other 76
 
2.2%
Other values (18) 117
 
3.3%

Most occurring characters

ValueCountFrequency (%)
t 3066
11.9%
m 2356
9.1%
a 2233
 
8.7%
S 2191
 
8.5%
o 1978
 
7.7%
c 1863
 
7.2%
h 1741
 
6.7%
e 1692
 
6.6%
n 1578
 
6.1%
l 1305
 
5.1%
Other values (28) 5800
22.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 25803
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 3066
11.9%
m 2356
9.1%
a 2233
 
8.7%
S 2191
 
8.5%
o 1978
 
7.7%
c 1863
 
7.2%
h 1741
 
6.7%
e 1692
 
6.6%
n 1578
 
6.1%
l 1305
 
5.1%
Other values (28) 5800
22.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 25803
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 3066
11.9%
m 2356
9.1%
a 2233
 
8.7%
S 2191
 
8.5%
o 1978
 
7.7%
c 1863
 
7.2%
h 1741
 
6.7%
e 1692
 
6.6%
n 1578
 
6.1%
l 1305
 
5.1%
Other values (28) 5800
22.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 25803
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 3066
11.9%
m 2356
9.1%
a 2233
 
8.7%
S 2191
 
8.5%
o 1978
 
7.7%
c 1863
 
7.2%
h 1741
 
6.7%
e 1692
 
6.6%
n 1578
 
6.1%
l 1305
 
5.1%
Other values (28) 5800
22.5%

Primary Site - labeled
Categorical

High correlation 

Distinct45
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
C16.9-Stomach, NOS
478 
C16.1-Fundus of stomach
277 
C16.6-Greater curvature of stomach NOS
238 
C16.2-Body of stomach
206 
C17.9-Small intestine, NOS
186 
Other values (40)
1045 

Length

Max length56
Median length44
Mean length23.500823
Min length11

Characters and Unicode

Total characters57107
Distinct characters59
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.3%

Sample

1st rowC16.3-Gastric antrum
2nd rowC17.1-Jejunum
3rd rowC17.9-Small intestine, NOS
4th rowC16.6-Greater curvature of stomach NOS
5th rowC16.1-Fundus of stomach

Common Values

ValueCountFrequency (%)
C16.9-Stomach, NOS 478
19.7%
C16.1-Fundus of stomach 277
11.4%
C16.6-Greater curvature of stomach NOS 238
9.8%
C16.2-Body of stomach 206
8.5%
C17.9-Small intestine, NOS 186
 
7.7%
C16.5-Lesser curvature of stomach NOS 185
 
7.6%
C17.1-Jejunum 157
 
6.5%
C17.0-Duodenum 140
 
5.8%
C16.3-Gastric antrum 111
 
4.6%
C16.0-Cardia, NOS 103
 
4.2%
Other values (35) 349
14.4%

Length

2025-07-24T16:17:13.659465image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
nos 1327
19.6%
of 1007
14.9%
stomach 956
14.1%
c16.9-stomach 478
 
7.1%
curvature 423
 
6.3%
c16.1-fundus 277
 
4.1%
c16.6-greater 238
 
3.5%
c16.2-body 206
 
3.0%
intestine 197
 
2.9%
c17.9-small 186
 
2.8%
Other values (67) 1464
21.7%

Most occurring characters

ValueCountFrequency (%)
4329
 
7.6%
t 3246
 
5.7%
o 3122
 
5.5%
a 3072
 
5.4%
1 2710
 
4.7%
C 2549
 
4.5%
e 2491
 
4.4%
- 2430
 
4.3%
. 2430
 
4.3%
u 2280
 
4.0%
Other values (49) 28448
49.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 57107
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
4329
 
7.6%
t 3246
 
5.7%
o 3122
 
5.5%
a 3072
 
5.4%
1 2710
 
4.7%
C 2549
 
4.5%
e 2491
 
4.4%
- 2430
 
4.3%
. 2430
 
4.3%
u 2280
 
4.0%
Other values (49) 28448
49.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 57107
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
4329
 
7.6%
t 3246
 
5.7%
o 3122
 
5.5%
a 3072
 
5.4%
1 2710
 
4.7%
C 2549
 
4.5%
e 2491
 
4.4%
- 2430
 
4.3%
. 2430
 
4.3%
u 2280
 
4.0%
Other values (49) 28448
49.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 57107
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
4329
 
7.6%
t 3246
 
5.7%
o 3122
 
5.5%
a 3072
 
5.4%
1 2710
 
4.7%
C 2549
 
4.5%
e 2491
 
4.4%
- 2430
 
4.3%
. 2430
 
4.3%
u 2280
 
4.0%
Other values (49) 28448
49.8%

Primary Site
Real number (ℝ)

High correlation 

Distinct45
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.22346
Minimum154
Maximum809
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.0 KiB
2025-07-24T16:17:13.788626image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum154
5-th percentile161
Q1163
median169
Q3171
95-th percentile269
Maximum809
Range655
Interquartile range (IQR)8

Descriptive statistics

Standard deviation60.416237
Coefficient of variation (CV)0.33710006
Kurtosis60.871169
Mean179.22346
Median Absolute Deviation (MAD)4
Skewness7.2221934
Sum435513
Variance3650.1217
MonotonicityNot monotonic
2025-07-24T16:17:13.943241image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
169 478
19.7%
161 277
11.4%
166 238
9.8%
162 206
8.5%
179 186
 
7.7%
165 185
 
7.6%
171 157
 
6.5%
170 140
 
5.8%
163 111
 
4.6%
160 103
 
4.2%
Other values (35) 349
14.4%
ValueCountFrequency (%)
154 3
 
0.1%
155 9
 
0.4%
159 1
 
< 0.1%
160 103
 
4.2%
161 277
11.4%
162 206
8.5%
163 111
4.6%
164 3
 
0.1%
165 185
7.6%
166 238
9.8%
ValueCountFrequency (%)
809 9
0.4%
763 1
 
< 0.1%
762 2
 
0.1%
495 4
 
0.2%
494 1
 
< 0.1%
488 2
 
0.1%
482 7
0.3%
481 14
0.6%
480 6
0.2%
382 1
 
< 0.1%

Derived Summary Grade 2018 (2018+)
Categorical

High correlation 

Distinct7
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
L
1245 
9
855 
H
295 
A
 
24
C
 
5
Other values (2)
 
6

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2430
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowL
2nd rowL
3rd row9
4th rowL
5th row9

Common Values

ValueCountFrequency (%)
L 1245
51.2%
9 855
35.2%
H 295
 
12.1%
A 24
 
1.0%
C 5
 
0.2%
B 4
 
0.2%
D 2
 
0.1%

Length

2025-07-24T16:17:14.077853image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:14.185595image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
l 1245
51.2%
9 855
35.2%
h 295
 
12.1%
a 24
 
1.0%
c 5
 
0.2%
b 4
 
0.2%
d 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
L 1245
51.2%
9 855
35.2%
H 295
 
12.1%
A 24
 
1.0%
C 5
 
0.2%
B 4
 
0.2%
D 2
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
L 1245
51.2%
9 855
35.2%
H 295
 
12.1%
A 24
 
1.0%
C 5
 
0.2%
B 4
 
0.2%
D 2
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
L 1245
51.2%
9 855
35.2%
H 295
 
12.1%
A 24
 
1.0%
C 5
 
0.2%
B 4
 
0.2%
D 2
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
L 1245
51.2%
9 855
35.2%
H 295
 
12.1%
A 24
 
1.0%
C 5
 
0.2%
B 4
 
0.2%
D 2
 
0.1%

Grade Clinical (2018+)
Categorical

High correlation  Imbalance 

Distinct7
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
9
2007 
L
341 
H
 
69
A
 
6
C
 
4
Other values (2)
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2430
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row9
2nd row9
3rd row9
4th row9
5th row9

Common Values

ValueCountFrequency (%)
9 2007
82.6%
L 341
 
14.0%
H 69
 
2.8%
A 6
 
0.2%
C 4
 
0.2%
D 2
 
0.1%
B 1
 
< 0.1%

Length

2025-07-24T16:17:14.305274image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:14.407003image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
9 2007
82.6%
l 341
 
14.0%
h 69
 
2.8%
a 6
 
0.2%
c 4
 
0.2%
d 2
 
0.1%
b 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
9 2007
82.6%
L 341
 
14.0%
H 69
 
2.8%
A 6
 
0.2%
C 4
 
0.2%
D 2
 
0.1%
B 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
9 2007
82.6%
L 341
 
14.0%
H 69
 
2.8%
A 6
 
0.2%
C 4
 
0.2%
D 2
 
0.1%
B 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
9 2007
82.6%
L 341
 
14.0%
H 69
 
2.8%
A 6
 
0.2%
C 4
 
0.2%
D 2
 
0.1%
B 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
9 2007
82.6%
L 341
 
14.0%
H 69
 
2.8%
A 6
 
0.2%
C 4
 
0.2%
D 2
 
0.1%
B 1
 
< 0.1%

Grade Pathological (2018+)
Categorical

High correlation 

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
9
1089 
L
1073 
H
243 
A
 
19
B
 
4

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2430
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowL
2nd rowL
3rd row9
4th rowL
5th row9

Common Values

ValueCountFrequency (%)
9 1089
44.8%
L 1073
44.2%
H 243
 
10.0%
A 19
 
0.8%
B 4
 
0.2%
C 2
 
0.1%

Length

2025-07-24T16:17:14.524674image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:14.653314image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
9 1089
44.8%
l 1073
44.2%
h 243
 
10.0%
a 19
 
0.8%
b 4
 
0.2%
c 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
9 1089
44.8%
L 1073
44.2%
H 243
 
10.0%
A 19
 
0.8%
B 4
 
0.2%
C 2
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
9 1089
44.8%
L 1073
44.2%
H 243
 
10.0%
A 19
 
0.8%
B 4
 
0.2%
C 2
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
9 1089
44.8%
L 1073
44.2%
H 243
 
10.0%
A 19
 
0.8%
B 4
 
0.2%
C 2
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
9 1089
44.8%
L 1073
44.2%
H 243
 
10.0%
A 19
 
0.8%
B 4
 
0.2%
C 2
 
0.1%

Diagnostic Confirmation
Categorical

Imbalance 

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Positive histology
2323 
Positive exfoliative cytology, no positive histology
 
81
Radiography without microscopic confirm
 
15
Unknown
 
5
Clinical diagnosis only
 
4

Length

Max length53
Median length18
Mean length19.277366
Min length7

Characters and Unicode

Total characters46844
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPositive histology
2nd rowPositive histology
3rd rowPositive histology
4th rowPositive histology
5th rowPositive histology

Common Values

ValueCountFrequency (%)
Positive histology 2323
95.6%
Positive exfoliative cytology, no positive histology 81
 
3.3%
Radiography without microscopic confirm 15
 
0.6%
Unknown 5
 
0.2%
Clinical diagnosis only 4
 
0.2%
Direct visualization without microscopic confirmation 2
 
0.1%

Length

2025-07-24T16:17:14.790947image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:14.894669image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
positive 2485
47.6%
histology 2404
46.1%
exfoliative 81
 
1.6%
cytology 81
 
1.6%
no 81
 
1.6%
without 17
 
0.3%
microscopic 17
 
0.3%
radiography 15
 
0.3%
confirm 15
 
0.3%
unknown 5
 
0.1%
Other values (6) 18
 
0.3%

Most occurring characters

ValueCountFrequency (%)
o 7717
16.5%
i 7645
16.3%
t 5091
10.9%
s 4916
10.5%
2789
 
6.0%
e 2649
 
5.7%
y 2585
 
5.5%
l 2580
 
5.5%
v 2568
 
5.5%
g 2504
 
5.3%
Other values (20) 5800
12.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 46844
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 7717
16.5%
i 7645
16.3%
t 5091
10.9%
s 4916
10.5%
2789
 
6.0%
e 2649
 
5.7%
y 2585
 
5.5%
l 2580
 
5.5%
v 2568
 
5.5%
g 2504
 
5.3%
Other values (20) 5800
12.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 46844
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 7717
16.5%
i 7645
16.3%
t 5091
10.9%
s 4916
10.5%
2789
 
6.0%
e 2649
 
5.7%
y 2585
 
5.5%
l 2580
 
5.5%
v 2568
 
5.5%
g 2504
 
5.3%
Other values (20) 5800
12.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 46844
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 7717
16.5%
i 7645
16.3%
t 5091
10.9%
s 4916
10.5%
2789
 
6.0%
e 2649
 
5.7%
y 2585
 
5.5%
l 2580
 
5.5%
v 2568
 
5.5%
g 2504
 
5.3%
Other values (20) 5800
12.4%

AJCC ID (2018+)
Categorical

High correlation 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
GIST: Gastric and Omental
1652 
GIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and Peritoneal
677 
No AJCC Chapter
 
101

Length

Max length74
Median length25
Mean length38.235802
Min length15

Characters and Unicode

Total characters92913
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGIST: Gastric and Omental
2nd rowGIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and Peritoneal
3rd rowGIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and Peritoneal
4th rowGIST: Gastric and Omental
5th rowGIST: Gastric and Omental

Common Values

ValueCountFrequency (%)
GIST: Gastric and Omental 1652
68.0%
GIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and Peritoneal 677
27.9%
No AJCC Chapter 101
 
4.2%

Length

2025-07-24T16:17:15.028775image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:15.128733image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
gist 2329
18.9%
and 2329
18.9%
gastric 1652
13.4%
omental 1652
13.4%
small 677
 
5.5%
intestinal 677
 
5.5%
esophageal 677
 
5.5%
colorectal 677
 
5.5%
mesenteric 677
 
5.5%
peritoneal 677
 
5.5%
Other values (3) 303
 
2.5%

Most occurring characters

ValueCountFrequency (%)
9897
 
10.7%
a 9796
 
10.5%
e 7169
 
7.7%
t 6790
 
7.3%
n 6689
 
7.2%
l 6391
 
6.9%
G 3981
 
4.3%
r 3784
 
4.1%
s 3683
 
4.0%
i 3683
 
4.0%
Other values (20) 31050
33.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 92913
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
9897
 
10.7%
a 9796
 
10.5%
e 7169
 
7.7%
t 6790
 
7.3%
n 6689
 
7.2%
l 6391
 
6.9%
G 3981
 
4.3%
r 3784
 
4.1%
s 3683
 
4.0%
i 3683
 
4.0%
Other values (20) 31050
33.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 92913
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
9897
 
10.7%
a 9796
 
10.5%
e 7169
 
7.7%
t 6790
 
7.3%
n 6689
 
7.2%
l 6391
 
6.9%
G 3981
 
4.3%
r 3784
 
4.1%
s 3683
 
4.0%
i 3683
 
4.0%
Other values (20) 31050
33.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 92913
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
9897
 
10.7%
a 9796
 
10.5%
e 7169
 
7.7%
t 6790
 
7.3%
n 6689
 
7.2%
l 6391
 
6.9%
G 3981
 
4.3%
r 3784
 
4.1%
s 3683
 
4.0%
i 3683
 
4.0%
Other values (20) 31050
33.4%

Derived EOD 2018 T Recode (2018+)
Categorical

High correlation 

Distinct7
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
T2
807 
T1
507 
T3
487 
T4
297 
TX
228 
Other values (2)
104 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4860
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowT2
2nd rowT2
3rd rowT2
4th rowT2
5th rowT2

Common Values

ValueCountFrequency (%)
T2 807
33.2%
T1 507
20.9%
T3 487
20.0%
T4 297
 
12.2%
TX 228
 
9.4%
88 101
 
4.2%
T0 3
 
0.1%

Length

2025-07-24T16:17:15.238805image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:15.344520image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
t2 807
33.2%
t1 507
20.9%
t3 487
20.0%
t4 297
 
12.2%
tx 228
 
9.4%
88 101
 
4.2%
t0 3
 
0.1%

Most occurring characters

ValueCountFrequency (%)
T 2329
47.9%
2 807
 
16.6%
1 507
 
10.4%
3 487
 
10.0%
4 297
 
6.1%
X 228
 
4.7%
8 202
 
4.2%
0 3
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
T 2329
47.9%
2 807
 
16.6%
1 507
 
10.4%
3 487
 
10.0%
4 297
 
6.1%
X 228
 
4.7%
8 202
 
4.2%
0 3
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
T 2329
47.9%
2 807
 
16.6%
1 507
 
10.4%
3 487
 
10.0%
4 297
 
6.1%
X 228
 
4.7%
8 202
 
4.2%
0 3
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
T 2329
47.9%
2 807
 
16.6%
1 507
 
10.4%
3 487
 
10.0%
4 297
 
6.1%
X 228
 
4.7%
8 202
 
4.2%
0 3
 
0.1%

Derived EOD 2018 N Recode (2018+)
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
N0
2262 
88
 
101
N1
 
67

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4860
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowN0
2nd rowN0
3rd rowN0
4th rowN0
5th rowN0

Common Values

ValueCountFrequency (%)
N0 2262
93.1%
88 101
 
4.2%
N1 67
 
2.8%

Length

2025-07-24T16:17:15.469158image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:15.559915image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
n0 2262
93.1%
88 101
 
4.2%
n1 67
 
2.8%

Most occurring characters

ValueCountFrequency (%)
N 2329
47.9%
0 2262
46.5%
8 202
 
4.2%
1 67
 
1.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 2329
47.9%
0 2262
46.5%
8 202
 
4.2%
1 67
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 2329
47.9%
0 2262
46.5%
8 202
 
4.2%
1 67
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 2329
47.9%
0 2262
46.5%
8 202
 
4.2%
1 67
 
1.4%

Derived EOD 2018 M Recode (2018+)
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
M0
2088 
M1
241 
88
 
101

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4860
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM0
2nd rowM0
3rd rowM0
4th rowM0
5th rowM0

Common Values

ValueCountFrequency (%)
M0 2088
85.9%
M1 241
 
9.9%
88 101
 
4.2%

Length

2025-07-24T16:17:15.852134image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:15.942920image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
m0 2088
85.9%
m1 241
 
9.9%
88 101
 
4.2%

Most occurring characters

ValueCountFrequency (%)
M 2329
47.9%
0 2088
43.0%
1 241
 
5.0%
8 202
 
4.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 2329
47.9%
0 2088
43.0%
1 241
 
5.0%
8 202
 
4.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 2329
47.9%
0 2088
43.0%
1 241
 
5.0%
8 202
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 2329
47.9%
0 2088
43.0%
1 241
 
5.0%
8 202
 
4.2%

Derived EOD 2018 Stage Group Recode (2018+)
Categorical

High correlation 

Distinct10
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
99
678 
1A
656 
4
282 
1
203 
2
176 
Other values (5)
435 

Length

Max length2
Median length2
Mean length1.7234568
Min length1

Characters and Unicode

Total characters4188
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1A
2nd row1
3rd row99
4th row1A
5th row99

Common Values

ValueCountFrequency (%)
99 678
27.9%
1A 656
27.0%
4 282
11.6%
1 203
 
8.4%
2 176
 
7.2%
1B 136
 
5.6%
3B 114
 
4.7%
88 101
 
4.2%
3A 73
 
3.0%
3 11
 
0.5%

Length

2025-07-24T16:17:16.052632image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:16.171281image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
99 678
27.9%
1a 656
27.0%
4 282
11.6%
1 203
 
8.4%
2 176
 
7.2%
1b 136
 
5.6%
3b 114
 
4.7%
88 101
 
4.2%
3a 73
 
3.0%
3 11
 
0.5%

Most occurring characters

ValueCountFrequency (%)
9 1356
32.4%
1 995
23.8%
A 729
17.4%
4 282
 
6.7%
B 250
 
6.0%
8 202
 
4.8%
3 198
 
4.7%
2 176
 
4.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4188
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
9 1356
32.4%
1 995
23.8%
A 729
17.4%
4 282
 
6.7%
B 250
 
6.0%
8 202
 
4.8%
3 198
 
4.7%
2 176
 
4.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4188
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
9 1356
32.4%
1 995
23.8%
A 729
17.4%
4 282
 
6.7%
B 250
 
6.0%
8 202
 
4.8%
3 198
 
4.7%
2 176
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4188
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
9 1356
32.4%
1 995
23.8%
A 729
17.4%
4 282
 
6.7%
B 250
 
6.0%
8 202
 
4.8%
3 198
 
4.7%
2 176
 
4.2%

RX Summ--Surg Prim Site (1998+)
Categorical

High correlation 

Distinct28
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
30
1063 
00
635 
27
141 
33
116 
32
 
83
Other values (23)
392 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4860
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)0.2%

Sample

1st row30
2nd row30
3rd row30
4th row30
5th row30

Common Values

ValueCountFrequency (%)
30 1063
43.7%
00 635
26.1%
27 141
 
5.8%
33 116
 
4.8%
32 83
 
3.4%
20 69
 
2.8%
51 53
 
2.2%
60 51
 
2.1%
61 47
 
1.9%
40 41
 
1.7%
Other values (18) 131
 
5.4%

Length

2025-07-24T16:17:16.306456image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
30 1063
43.7%
00 635
26.1%
27 141
 
5.8%
33 116
 
4.8%
32 83
 
3.4%
20 69
 
2.8%
51 53
 
2.2%
60 51
 
2.1%
61 47
 
1.9%
40 41
 
1.7%
Other values (18) 131
 
5.4%

Most occurring characters

ValueCountFrequency (%)
0 2526
52.0%
3 1407
29.0%
2 332
 
6.8%
7 142
 
2.9%
1 131
 
2.7%
6 114
 
2.3%
5 79
 
1.6%
4 60
 
1.2%
9 54
 
1.1%
8 15
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2526
52.0%
3 1407
29.0%
2 332
 
6.8%
7 142
 
2.9%
1 131
 
2.7%
6 114
 
2.3%
5 79
 
1.6%
4 60
 
1.2%
9 54
 
1.1%
8 15
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2526
52.0%
3 1407
29.0%
2 332
 
6.8%
7 142
 
2.9%
1 131
 
2.7%
6 114
 
2.3%
5 79
 
1.6%
4 60
 
1.2%
9 54
 
1.1%
8 15
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2526
52.0%
3 1407
29.0%
2 332
 
6.8%
7 142
 
2.9%
1 131
 
2.7%
6 114
 
2.3%
5 79
 
1.6%
4 60
 
1.2%
9 54
 
1.1%
8 15
 
0.3%

RX Summ--Scope Reg LN Sur (2003+)
Categorical

High correlation  Missing 

Distinct6
Distinct (%)1.1%
Missing1875
Missing (%)77.2%
Memory size38.0 KiB
4 or more regional lymph nodes removed
273 
1 to 3 regional lymph nodes removed
206 
Unknown or not applicable
62 
Biopsy or aspiration of regional lymph node, NOS
 
10
Number of regional lymph nodes removed unknown
 
3

Length

Max length48
Median length46
Mean length35.636036
Min length25

Characters and Unicode

Total characters19778
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row4 or more regional lymph nodes removed
2nd row1 to 3 regional lymph nodes removed
3rd row4 or more regional lymph nodes removed
4th row4 or more regional lymph nodes removed
5th rowUnknown or not applicable

Common Values

ValueCountFrequency (%)
4 or more regional lymph nodes removed 273
 
11.2%
1 to 3 regional lymph nodes removed 206
 
8.5%
Unknown or not applicable 62
 
2.6%
Biopsy or aspiration of regional lymph node, NOS 10
 
0.4%
Number of regional lymph nodes removed unknown 3
 
0.1%
Sentinel lymph node biopsy 1
 
< 0.1%
(Missing) 1875
77.2%

Length

2025-07-24T16:17:16.414885image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:16.525592image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
lymph 493
13.3%
regional 492
13.3%
nodes 482
13.0%
removed 482
13.0%
or 345
9.3%
4 273
7.4%
more 273
7.4%
1 206
5.6%
to 206
5.6%
3 206
5.6%
Other values (10) 248
6.7%

Most occurring characters

ValueCountFrequency (%)
3151
15.9%
o 2452
12.4%
e 2289
11.6%
r 1605
 
8.1%
n 1254
 
6.3%
m 1251
 
6.3%
l 1110
 
5.6%
d 975
 
4.9%
p 638
 
3.2%
a 636
 
3.2%
Other values (22) 4417
22.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 19778
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3151
15.9%
o 2452
12.4%
e 2289
11.6%
r 1605
 
8.1%
n 1254
 
6.3%
m 1251
 
6.3%
l 1110
 
5.6%
d 975
 
4.9%
p 638
 
3.2%
a 636
 
3.2%
Other values (22) 4417
22.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 19778
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3151
15.9%
o 2452
12.4%
e 2289
11.6%
r 1605
 
8.1%
n 1254
 
6.3%
m 1251
 
6.3%
l 1110
 
5.6%
d 975
 
4.9%
p 638
 
3.2%
a 636
 
3.2%
Other values (22) 4417
22.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 19778
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3151
15.9%
o 2452
12.4%
e 2289
11.6%
r 1605
 
8.1%
n 1254
 
6.3%
m 1251
 
6.3%
l 1110
 
5.6%
d 975
 
4.9%
p 638
 
3.2%
a 636
 
3.2%
Other values (22) 4417
22.3%

RX Summ--Surg Oth Reg/Dis (2003+)
Categorical

Imbalance 

Distinct7
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
None; diagnosed at autopsy
2291 
Non-primary surgical procedure to other regional sites
 
48
Non-primary surgical procedure to distant site
 
47
Unknown; death certificate only
 
21
Non-primary surgical procedure performed
 
16
Other values (2)
 
7

Length

Max length60
Median length26
Mean length27.169136
Min length26

Characters and Unicode

Total characters66021
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone; diagnosed at autopsy
2nd rowNone; diagnosed at autopsy
3rd rowNone; diagnosed at autopsy
4th rowNone; diagnosed at autopsy
5th rowNone; diagnosed at autopsy

Common Values

ValueCountFrequency (%)
None; diagnosed at autopsy 2291
94.3%
Non-primary surgical procedure to other regional sites 48
 
2.0%
Non-primary surgical procedure to distant site 47
 
1.9%
Unknown; death certificate only 21
 
0.9%
Non-primary surgical procedure performed 16
 
0.7%
Any combo of sur proc to oth rg, dis lym nd, and/or dis site 5
 
0.2%
Non-primary surgical procedure to distant lymph node(s) 2
 
0.1%

Length

2025-07-24T16:17:16.671173image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:16.798498image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
none 2291
22.9%
diagnosed 2291
22.9%
at 2291
22.9%
autopsy 2291
22.9%
non-primary 113
 
1.1%
surgical 113
 
1.1%
procedure 113
 
1.1%
to 102
 
1.0%
site 52
 
0.5%
distant 49
 
0.5%
Other values (21) 308
 
3.1%

Most occurring characters

ValueCountFrequency (%)
7584
11.5%
o 7387
11.2%
a 7243
11.0%
e 5101
 
7.7%
t 4998
 
7.6%
s 4909
 
7.4%
n 4893
 
7.4%
d 4803
 
7.3%
i 2766
 
4.2%
p 2540
 
3.8%
Other values (21) 13797
20.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 66021
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
7584
11.5%
o 7387
11.2%
a 7243
11.0%
e 5101
 
7.7%
t 4998
 
7.6%
s 4909
 
7.4%
n 4893
 
7.4%
d 4803
 
7.3%
i 2766
 
4.2%
p 2540
 
3.8%
Other values (21) 13797
20.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 66021
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
7584
11.5%
o 7387
11.2%
a 7243
11.0%
e 5101
 
7.7%
t 4998
 
7.6%
s 4909
 
7.4%
n 4893
 
7.4%
d 4803
 
7.3%
i 2766
 
4.2%
p 2540
 
3.8%
Other values (21) 13797
20.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 66021
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
7584
11.5%
o 7387
11.2%
a 7243
11.0%
e 5101
 
7.7%
t 4998
 
7.6%
s 4909
 
7.4%
n 4893
 
7.4%
d 4803
 
7.3%
i 2766
 
4.2%
p 2540
 
3.8%
Other values (21) 13797
20.9%

RX Summ--Surg/Rad Seq
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
No radiation and/or no surgery; unknown if surgery and/or radiation given
2426 
Radiation after surgery
 
2
Radiation prior to surgery
 
2

Length

Max length73
Median length73
Mean length72.920165
Min length23

Characters and Unicode

Total characters177196
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo radiation and/or no surgery; unknown if surgery and/or radiation given
2nd rowNo radiation and/or no surgery; unknown if surgery and/or radiation given
3rd rowNo radiation and/or no surgery; unknown if surgery and/or radiation given
4th rowNo radiation and/or no surgery; unknown if surgery and/or radiation given
5th rowNo radiation and/or no surgery; unknown if surgery and/or radiation given

Common Values

ValueCountFrequency (%)
No radiation and/or no surgery; unknown if surgery and/or radiation given 2426
99.8%
Radiation after surgery 2
 
0.1%
Radiation prior to surgery 2
 
0.1%

Length

2025-07-24T16:17:16.941682image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:17.035435image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
radiation 4856
18.2%
surgery 4856
18.2%
no 4852
18.2%
and/or 4852
18.2%
unknown 2426
9.1%
if 2426
9.1%
given 2426
9.1%
after 2
 
< 0.1%
prior 2
 
< 0.1%
to 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
24270
13.7%
n 21838
12.3%
r 19422
11.0%
o 16990
9.6%
i 14566
8.2%
a 14566
8.2%
d 9708
 
5.5%
e 7284
 
4.1%
g 7282
 
4.1%
u 7282
 
4.1%
Other values (12) 33988
19.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 177196
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
24270
13.7%
n 21838
12.3%
r 19422
11.0%
o 16990
9.6%
i 14566
8.2%
a 14566
8.2%
d 9708
 
5.5%
e 7284
 
4.1%
g 7282
 
4.1%
u 7282
 
4.1%
Other values (12) 33988
19.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 177196
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
24270
13.7%
n 21838
12.3%
r 19422
11.0%
o 16990
9.6%
i 14566
8.2%
a 14566
8.2%
d 9708
 
5.5%
e 7284
 
4.1%
g 7282
 
4.1%
u 7282
 
4.1%
Other values (12) 33988
19.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 177196
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
24270
13.7%
n 21838
12.3%
r 19422
11.0%
o 16990
9.6%
i 14566
8.2%
a 14566
8.2%
d 9708
 
5.5%
e 7284
 
4.1%
g 7282
 
4.1%
u 7282
 
4.1%
Other values (12) 33988
19.2%

Reason no cancer-directed surgery
Categorical

High correlation  Imbalance 

Distinct8
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Surgery performed
1766 
Not recommended
525 
Not recommended, contraindicated due to other cond; autopsy only (1973-2002)
 
40
Recommended but not performed, patient refused
 
29
Recommended, unknown if performed
 
27
Other values (3)
 
43

Length

Max length76
Median length17
Mean length18.627984
Min length15

Characters and Unicode

Total characters45266
Distinct characters38
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSurgery performed
2nd rowSurgery performed
3rd rowSurgery performed
4th rowSurgery performed
5th rowSurgery performed

Common Values

ValueCountFrequency (%)
Surgery performed 1766
72.7%
Not recommended 525
 
21.6%
Not recommended, contraindicated due to other cond; autopsy only (1973-2002) 40
 
1.6%
Recommended but not performed, patient refused 29
 
1.2%
Recommended, unknown if performed 27
 
1.1%
Recommended but not performed, unknown reason 19
 
0.8%
Unknown; death certificate; or autopsy only (2003+) 19
 
0.8%
Not performed, patient died prior to recommended surgery 5
 
0.2%

Length

2025-07-24T16:17:17.154085image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:17.278784image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
performed 1846
33.3%
surgery 1771
31.9%
recommended 645
 
11.6%
not 618
 
11.1%
unknown 65
 
1.2%
autopsy 59
 
1.1%
only 59
 
1.1%
but 48
 
0.9%
to 45
 
0.8%
other 40
 
0.7%
Other values (14) 355
 
6.4%

Most occurring characters

ValueCountFrequency (%)
r 7980
17.6%
e 7691
17.0%
o 3500
7.7%
d 3354
 
7.4%
m 3136
 
6.9%
3121
 
6.9%
u 1993
 
4.4%
p 1944
 
4.3%
f 1921
 
4.2%
y 1889
 
4.2%
Other values (28) 8737
19.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 45266
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 7980
17.6%
e 7691
17.0%
o 3500
7.7%
d 3354
 
7.4%
m 3136
 
6.9%
3121
 
6.9%
u 1993
 
4.4%
p 1944
 
4.3%
f 1921
 
4.2%
y 1889
 
4.2%
Other values (28) 8737
19.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 45266
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 7980
17.6%
e 7691
17.0%
o 3500
7.7%
d 3354
 
7.4%
m 3136
 
6.9%
3121
 
6.9%
u 1993
 
4.4%
p 1944
 
4.3%
f 1921
 
4.2%
y 1889
 
4.2%
Other values (28) 8737
19.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 45266
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 7980
17.6%
e 7691
17.0%
o 3500
7.7%
d 3354
 
7.4%
m 3136
 
6.9%
3121
 
6.9%
u 1993
 
4.4%
p 1944
 
4.3%
f 1921
 
4.2%
y 1889
 
4.2%
Other values (28) 8737
19.3%

Radiation recode
Categorical

Imbalance 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
None/Unknown
2414 
Beam radiation
 
11
Recommended, unknown if administered
 
4
Refused (1988+)
 
1

Length

Max length36
Median length12
Mean length12.049794
Min length12

Characters and Unicode

Total characters29281
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowNone/Unknown
2nd rowNone/Unknown
3rd rowNone/Unknown
4th rowNone/Unknown
5th rowNone/Unknown

Common Values

ValueCountFrequency (%)
None/Unknown 2414
99.3%
Beam radiation 11
 
0.5%
Recommended, unknown if administered 4
 
0.2%
Refused (1988+) 1
 
< 0.1%

Length

2025-07-24T16:17:17.430381image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:17.527125image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
none/unknown 2414
98.4%
beam 11
 
0.4%
radiation 11
 
0.4%
recommended 4
 
0.2%
unknown 4
 
0.2%
if 4
 
0.2%
administered 4
 
0.2%
refused 1
 
< 0.1%
1988 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
n 9687
33.1%
o 4847
16.6%
e 2447
 
8.4%
w 2418
 
8.3%
k 2418
 
8.3%
N 2414
 
8.2%
U 2414
 
8.2%
/ 2414
 
8.2%
a 37
 
0.1%
i 34
 
0.1%
Other values (18) 151
 
0.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 29281
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 9687
33.1%
o 4847
16.6%
e 2447
 
8.4%
w 2418
 
8.3%
k 2418
 
8.3%
N 2414
 
8.2%
U 2414
 
8.2%
/ 2414
 
8.2%
a 37
 
0.1%
i 34
 
0.1%
Other values (18) 151
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 29281
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 9687
33.1%
o 4847
16.6%
e 2447
 
8.4%
w 2418
 
8.3%
k 2418
 
8.3%
N 2414
 
8.2%
U 2414
 
8.2%
/ 2414
 
8.2%
a 37
 
0.1%
i 34
 
0.1%
Other values (18) 151
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 29281
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 9687
33.1%
o 4847
16.6%
e 2447
 
8.4%
w 2418
 
8.3%
k 2418
 
8.3%
N 2414
 
8.2%
U 2414
 
8.2%
/ 2414
 
8.2%
a 37
 
0.1%
i 34
 
0.1%
Other values (18) 151
 
0.5%

Chemotherapy recode (yes, no/unk)
Categorical

High correlation 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
No/Unknown
1584 
Yes
846 

Length

Max length10
Median length10
Mean length7.562963
Min length3

Characters and Unicode

Total characters18378
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo/Unknown
2nd rowNo/Unknown
3rd rowNo/Unknown
4th rowNo/Unknown
5th rowNo/Unknown

Common Values

ValueCountFrequency (%)
No/Unknown 1584
65.2%
Yes 846
34.8%

Length

2025-07-24T16:17:17.647794image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:17.748524image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
no/unknown 1584
65.2%
yes 846
34.8%

Most occurring characters

ValueCountFrequency (%)
n 4752
25.9%
o 3168
17.2%
N 1584
 
8.6%
/ 1584
 
8.6%
U 1584
 
8.6%
k 1584
 
8.6%
w 1584
 
8.6%
Y 846
 
4.6%
e 846
 
4.6%
s 846
 
4.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 18378
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 4752
25.9%
o 3168
17.2%
N 1584
 
8.6%
/ 1584
 
8.6%
U 1584
 
8.6%
k 1584
 
8.6%
w 1584
 
8.6%
Y 846
 
4.6%
e 846
 
4.6%
s 846
 
4.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 18378
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 4752
25.9%
o 3168
17.2%
N 1584
 
8.6%
/ 1584
 
8.6%
U 1584
 
8.6%
k 1584
 
8.6%
w 1584
 
8.6%
Y 846
 
4.6%
e 846
 
4.6%
s 846
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 18378
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 4752
25.9%
o 3168
17.2%
N 1584
 
8.6%
/ 1584
 
8.6%
U 1584
 
8.6%
k 1584
 
8.6%
w 1584
 
8.6%
Y 846
 
4.6%
e 846
 
4.6%
s 846
 
4.6%

RX Summ--Systemic/Sur Seq (2007+)
Categorical

High correlation  Imbalance 

Distinct7
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
No systemic therapy and/or surgical procedures
1851 
Systemic therapy after surgery
331 
Systemic therapy before surgery
 
138
Systemic therapy both before and after surgery
 
104
Surgery both before and after systemic therapy
 
4
Other values (2)
 
2

Length

Max length46
Median length46
Mean length42.950206
Min length16

Characters and Unicode

Total characters104369
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st rowNo systemic therapy and/or surgical procedures
2nd rowNo systemic therapy and/or surgical procedures
3rd rowNo systemic therapy and/or surgical procedures
4th rowNo systemic therapy and/or surgical procedures
5th rowNo systemic therapy and/or surgical procedures

Common Values

ValueCountFrequency (%)
No systemic therapy and/or surgical procedures 1851
76.2%
Systemic therapy after surgery 331
 
13.6%
Systemic therapy before surgery 138
 
5.7%
Systemic therapy both before and after surgery 104
 
4.3%
Surgery both before and after systemic therapy 4
 
0.2%
Sequence unknown 1
 
< 0.1%
Intraoperative systemic therapy 1
 
< 0.1%

Length

2025-07-24T16:17:17.855242image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:17.973466image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
systemic 2429
17.7%
therapy 2429
17.7%
no 1851
13.5%
and/or 1851
13.5%
surgical 1851
13.5%
procedures 1851
13.5%
surgery 577
 
4.2%
after 439
 
3.2%
before 246
 
1.8%
both 108
 
0.8%
Other values (4) 111
 
0.8%

Most occurring characters

ValueCountFrequency (%)
r 11674
11.2%
11313
 
10.8%
e 10073
 
9.7%
s 8560
 
8.2%
a 6680
 
6.4%
c 6132
 
5.9%
o 5909
 
5.7%
y 5435
 
5.2%
t 5407
 
5.2%
i 4281
 
4.1%
Other values (18) 28905
27.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 104369
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 11674
11.2%
11313
 
10.8%
e 10073
 
9.7%
s 8560
 
8.2%
a 6680
 
6.4%
c 6132
 
5.9%
o 5909
 
5.7%
y 5435
 
5.2%
t 5407
 
5.2%
i 4281
 
4.1%
Other values (18) 28905
27.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 104369
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 11674
11.2%
11313
 
10.8%
e 10073
 
9.7%
s 8560
 
8.2%
a 6680
 
6.4%
c 6132
 
5.9%
o 5909
 
5.7%
y 5435
 
5.2%
t 5407
 
5.2%
i 4281
 
4.1%
Other values (18) 28905
27.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 104369
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 11674
11.2%
11313
 
10.8%
e 10073
 
9.7%
s 8560
 
8.2%
a 6680
 
6.4%
c 6132
 
5.9%
o 5909
 
5.7%
y 5435
 
5.2%
t 5407
 
5.2%
i 4281
 
4.1%
Other values (18) 28905
27.7%
Distinct201
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
2025-07-24T16:17:18.226048image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length19
Median length3
Mean length5.563786
Min length3

Characters and Unicode

Total characters13520
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique65 ?
Unique (%)2.7%

Sample

1st row045
2nd row000
3rd row000
4th row025
5th row002
ValueCountFrequency (%)
000 748
23.3%
unable 389
 
12.1%
to 389
 
12.1%
calculate 389
 
12.1%
028 25
 
0.8%
021 24
 
0.7%
014 24
 
0.7%
023 23
 
0.7%
042 23
 
0.7%
007 22
 
0.7%
Other values (194) 1153
35.9%
2025-07-24T16:17:18.588349image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3649
27.0%
a 1168
 
8.6%
l 1167
 
8.6%
779
 
5.8%
e 778
 
5.8%
t 778
 
5.8%
c 778
 
5.8%
1 435
 
3.2%
b 389
 
2.9%
n 389
 
2.9%
Other values (15) 3210
23.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 13520
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 3649
27.0%
a 1168
 
8.6%
l 1167
 
8.6%
779
 
5.8%
e 778
 
5.8%
t 778
 
5.8%
c 778
 
5.8%
1 435
 
3.2%
b 389
 
2.9%
n 389
 
2.9%
Other values (15) 3210
23.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 13520
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 3649
27.0%
a 1168
 
8.6%
l 1167
 
8.6%
779
 
5.8%
e 778
 
5.8%
t 778
 
5.8%
c 778
 
5.8%
1 435
 
3.2%
b 389
 
2.9%
n 389
 
2.9%
Other values (15) 3210
23.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 13520
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 3649
27.0%
a 1168
 
8.6%
l 1167
 
8.6%
779
 
5.8%
e 778
 
5.8%
t 778
 
5.8%
c 778
 
5.8%
1 435
 
3.2%
b 389
 
2.9%
n 389
 
2.9%
Other values (15) 3210
23.7%

EOD Primary Tumor Recode (2018+)
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
100
1897 
999
219 
700
 
168
400
 
139
800
 
7

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters7290
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row100
2nd row100
3rd row100
4th row100
5th row100

Common Values

ValueCountFrequency (%)
100 1897
78.1%
999 219
 
9.0%
700 168
 
6.9%
400 139
 
5.7%
800 7
 
0.3%

Length

2025-07-24T16:17:18.716974image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:18.815742image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
100 1897
78.1%
999 219
 
9.0%
700 168
 
6.9%
400 139
 
5.7%
800 7
 
0.3%

Most occurring characters

ValueCountFrequency (%)
0 4422
60.7%
1 1897
26.0%
9 657
 
9.0%
7 168
 
2.3%
4 139
 
1.9%
8 7
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 7290
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 4422
60.7%
1 1897
26.0%
9 657
 
9.0%
7 168
 
2.3%
4 139
 
1.9%
8 7
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 7290
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 4422
60.7%
1 1897
26.0%
9 657
 
9.0%
7 168
 
2.3%
4 139
 
1.9%
8 7
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 7290
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 4422
60.7%
1 1897
26.0%
9 657
 
9.0%
7 168
 
2.3%
4 139
 
1.9%
8 7
 
0.1%

EOD Regional Nodes Recode (2018+)
Categorical

High correlation  Imbalance 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
000
2148 
999
 
210
300
 
55
800
 
17

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters7290
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row000
2nd row000
3rd row000
4th row000
5th row000

Common Values

ValueCountFrequency (%)
000 2148
88.4%
999 210
 
8.6%
300 55
 
2.3%
800 17
 
0.7%

Length

2025-07-24T16:17:18.939381image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:19.042137image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
000 2148
88.4%
999 210
 
8.6%
300 55
 
2.3%
800 17
 
0.7%

Most occurring characters

ValueCountFrequency (%)
0 6588
90.4%
9 630
 
8.6%
3 55
 
0.8%
8 17
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 7290
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 6588
90.4%
9 630
 
8.6%
3 55
 
0.8%
8 17
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 7290
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 6588
90.4%
9 630
 
8.6%
3 55
 
0.8%
8 17
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 7290
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 6588
90.4%
9 630
 
8.6%
3 55
 
0.8%
8 17
 
0.2%

EOD Mets Recode (2018+)
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
00
2141 
70
287 
10
 
2

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4860
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row00
2nd row00
3rd row00
4th row00
5th row00

Common Values

ValueCountFrequency (%)
00 2141
88.1%
70 287
 
11.8%
10 2
 
0.1%

Length

2025-07-24T16:17:19.155834image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:19.254538image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
00 2141
88.1%
70 287
 
11.8%
10 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0 4571
94.1%
7 287
 
5.9%
1 2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 4571
94.1%
7 287
 
5.9%
1 2
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 4571
94.1%
7 287
 
5.9%
1 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 4571
94.1%
7 287
 
5.9%
1 2
 
< 0.1%
Distinct192
Distinct (%)7.9%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
2025-07-24T16:17:19.512463image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length67
Median length3
Mean length10.3
Min length3

Characters and Unicode

Total characters25029
Distinct characters43
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)1.8%

Sample

1st row028
2nd row023
3rd row035
4th row022
5th row023
ValueCountFrequency (%)
tumor 295
 
6.1%
size 294
 
6.1%
or 294
 
6.1%
unreasonable 282
 
5.8%
unknown 282
 
5.8%
includes 282
 
5.8%
any 282
 
5.8%
sizes 282
 
5.8%
401-989 282
 
5.8%
035 73
 
1.5%
Other values (203) 2177
45.1%
2025-07-24T16:17:19.926010image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2923
 
11.7%
2395
 
9.6%
n 2000
 
8.0%
e 1485
 
5.9%
s 1448
 
5.8%
o 1206
 
4.8%
1 979
 
3.9%
i 922
 
3.7%
r 898
 
3.6%
u 860
 
3.4%
Other values (33) 9913
39.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 25029
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2923
 
11.7%
2395
 
9.6%
n 2000
 
8.0%
e 1485
 
5.9%
s 1448
 
5.8%
o 1206
 
4.8%
1 979
 
3.9%
i 922
 
3.7%
r 898
 
3.6%
u 860
 
3.4%
Other values (33) 9913
39.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 25029
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2923
 
11.7%
2395
 
9.6%
n 2000
 
8.0%
e 1485
 
5.9%
s 1448
 
5.8%
o 1206
 
4.8%
1 979
 
3.9%
i 922
 
3.7%
r 898
 
3.6%
u 860
 
3.4%
Other values (33) 9913
39.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 25029
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2923
 
11.7%
2395
 
9.6%
n 2000
 
8.0%
e 1485
 
5.9%
s 1448
 
5.8%
o 1206
 
4.8%
1 979
 
3.9%
i 922
 
3.7%
r 898
 
3.6%
u 860
 
3.4%
Other values (33) 9913
39.6%

Tumor Size Summary (2016+)
Real number (ℝ)

High correlation 

Distinct216
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean152.69465
Minimum0
Maximum999
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size38.0 KiB
2025-07-24T16:17:20.065281image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q124
median45
Q398
95-th percentile999
Maximum999
Range999
Interquartile range (IQR)74

Descriptive statistics

Standard deviation289.18612
Coefficient of variation (CV)1.8938851
Kurtosis4.4939333
Mean152.69465
Median Absolute Deviation (MAD)28
Skewness2.4975768
Sum371048
Variance83628.613
MonotonicityNot monotonic
2025-07-24T16:17:20.219841image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
999 244
 
10.0%
35 73
 
3.0%
40 68
 
2.8%
25 61
 
2.5%
45 54
 
2.2%
30 52
 
2.1%
20 47
 
1.9%
50 47
 
1.9%
15 45
 
1.9%
5 43
 
1.8%
Other values (206) 1696
69.8%
ValueCountFrequency (%)
0 1
 
< 0.1%
1 10
 
0.4%
2 12
 
0.5%
3 25
1.0%
4 39
1.6%
5 43
1.8%
6 35
1.4%
7 27
1.1%
8 35
1.4%
9 12
 
0.5%
ValueCountFrequency (%)
999 244
10.0%
990 1
 
< 0.1%
989 2
 
0.1%
380 1
 
< 0.1%
350 1
 
< 0.1%
333 1
 
< 0.1%
330 1
 
< 0.1%
320 1
 
< 0.1%
310 1
 
< 0.1%
307 1
 
< 0.1%

Regional nodes examined (1988+)
Categorical

High correlation  Imbalance 

Distinct44
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
00
1874 
01
 
100
02
 
61
99
 
55
03
 
51
Other values (39)
289 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4860
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)0.4%

Sample

1st row00
2nd row07
3rd row02
4th row00
5th row00

Common Values

ValueCountFrequency (%)
00 1874
77.1%
01 100
 
4.1%
02 61
 
2.5%
99 55
 
2.3%
03 51
 
2.1%
05 30
 
1.2%
04 26
 
1.1%
06 25
 
1.0%
07 16
 
0.7%
12 14
 
0.6%
Other values (34) 178
 
7.3%

Length

2025-07-24T16:17:20.353511image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
00 1874
77.1%
01 100
 
4.1%
02 61
 
2.5%
99 55
 
2.3%
03 51
 
2.1%
05 30
 
1.2%
04 26
 
1.1%
06 25
 
1.0%
07 16
 
0.7%
12 14
 
0.6%
Other values (34) 178
 
7.3%

Most occurring characters

ValueCountFrequency (%)
0 4098
84.3%
1 223
 
4.6%
9 148
 
3.0%
2 119
 
2.4%
3 75
 
1.5%
5 60
 
1.2%
4 47
 
1.0%
6 38
 
0.8%
7 26
 
0.5%
8 26
 
0.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 4098
84.3%
1 223
 
4.6%
9 148
 
3.0%
2 119
 
2.4%
3 75
 
1.5%
5 60
 
1.2%
4 47
 
1.0%
6 38
 
0.8%
7 26
 
0.5%
8 26
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 4098
84.3%
1 223
 
4.6%
9 148
 
3.0%
2 119
 
2.4%
3 75
 
1.5%
5 60
 
1.2%
4 47
 
1.0%
6 38
 
0.8%
7 26
 
0.5%
8 26
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 4098
84.3%
1 223
 
4.6%
9 148
 
3.0%
2 119
 
2.4%
3 75
 
1.5%
5 60
 
1.2%
4 47
 
1.0%
6 38
 
0.8%
7 26
 
0.5%
8 26
 
0.5%

Regional nodes positive (1988+)
Categorical

High correlation  Imbalance 

Distinct8
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
98
1874 
00
472 
99
 
59
01
 
16
02
 
4
Other values (3)
 
5

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4860
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row98
2nd row00
3rd row00
4th row98
5th row98

Common Values

ValueCountFrequency (%)
98 1874
77.1%
00 472
 
19.4%
99 59
 
2.4%
01 16
 
0.7%
02 4
 
0.2%
95 3
 
0.1%
03 1
 
< 0.1%
04 1
 
< 0.1%

Length

2025-07-24T16:17:20.463219image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:20.569903image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
98 1874
77.1%
00 472
 
19.4%
99 59
 
2.4%
01 16
 
0.7%
02 4
 
0.2%
95 3
 
0.1%
03 1
 
< 0.1%
04 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
9 1995
41.0%
8 1874
38.6%
0 966
19.9%
1 16
 
0.3%
2 4
 
0.1%
5 3
 
0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
9 1995
41.0%
8 1874
38.6%
0 966
19.9%
1 16
 
0.3%
2 4
 
0.1%
5 3
 
0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
9 1995
41.0%
8 1874
38.6%
0 966
19.9%
1 16
 
0.3%
2 4
 
0.1%
5 3
 
0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4860
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
9 1995
41.0%
8 1874
38.6%
0 966
19.9%
1 16
 
0.3%
2 4
 
0.1%
5 3
 
0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%

SEER Combined Mets at DX-bone (2010+)
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
No
2393 
Unknown
 
30
Yes
 
7

Length

Max length7
Median length2
Mean length2.0646091
Min length2

Characters and Unicode

Total characters5017
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 2393
98.5%
Unknown 30
 
1.2%
Yes 7
 
0.3%

Length

2025-07-24T16:17:20.693604image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:20.789349image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
no 2393
98.5%
unknown 30
 
1.2%
yes 7
 
0.3%

Most occurring characters

ValueCountFrequency (%)
o 2423
48.3%
N 2393
47.7%
n 90
 
1.8%
U 30
 
0.6%
k 30
 
0.6%
w 30
 
0.6%
Y 7
 
0.1%
e 7
 
0.1%
s 7
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5017
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 2423
48.3%
N 2393
47.7%
n 90
 
1.8%
U 30
 
0.6%
k 30
 
0.6%
w 30
 
0.6%
Y 7
 
0.1%
e 7
 
0.1%
s 7
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5017
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 2423
48.3%
N 2393
47.7%
n 90
 
1.8%
U 30
 
0.6%
k 30
 
0.6%
w 30
 
0.6%
Y 7
 
0.1%
e 7
 
0.1%
s 7
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5017
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 2423
48.3%
N 2393
47.7%
n 90
 
1.8%
U 30
 
0.6%
k 30
 
0.6%
w 30
 
0.6%
Y 7
 
0.1%
e 7
 
0.1%
s 7
 
0.1%

SEER Combined Mets at DX-brain (2010+)
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
No
2396 
Unknown
 
32
Yes
 
2

Length

Max length7
Median length2
Mean length2.0666667
Min length2

Characters and Unicode

Total characters5022
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 2396
98.6%
Unknown 32
 
1.3%
Yes 2
 
0.1%

Length

2025-07-24T16:17:20.895034image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:20.992802image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
no 2396
98.6%
unknown 32
 
1.3%
yes 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
o 2428
48.3%
N 2396
47.7%
n 96
 
1.9%
U 32
 
0.6%
k 32
 
0.6%
w 32
 
0.6%
Y 2
 
< 0.1%
e 2
 
< 0.1%
s 2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5022
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 2428
48.3%
N 2396
47.7%
n 96
 
1.9%
U 32
 
0.6%
k 32
 
0.6%
w 32
 
0.6%
Y 2
 
< 0.1%
e 2
 
< 0.1%
s 2
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5022
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 2428
48.3%
N 2396
47.7%
n 96
 
1.9%
U 32
 
0.6%
k 32
 
0.6%
w 32
 
0.6%
Y 2
 
< 0.1%
e 2
 
< 0.1%
s 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5022
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 2428
48.3%
N 2396
47.7%
n 96
 
1.9%
U 32
 
0.6%
k 32
 
0.6%
w 32
 
0.6%
Y 2
 
< 0.1%
e 2
 
< 0.1%
s 2
 
< 0.1%

SEER Combined Mets at DX-liver (2010+)
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
No
2229 
Yes
 
174
Unknown
 
27

Length

Max length7
Median length2
Mean length2.1271605
Min length2

Characters and Unicode

Total characters5169
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 2229
91.7%
Yes 174
 
7.2%
Unknown 27
 
1.1%

Length

2025-07-24T16:17:21.095527image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:21.189285image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
no 2229
91.7%
yes 174
 
7.2%
unknown 27
 
1.1%

Most occurring characters

ValueCountFrequency (%)
o 2256
43.6%
N 2229
43.1%
Y 174
 
3.4%
e 174
 
3.4%
s 174
 
3.4%
n 81
 
1.6%
U 27
 
0.5%
k 27
 
0.5%
w 27
 
0.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5169
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 2256
43.6%
N 2229
43.1%
Y 174
 
3.4%
e 174
 
3.4%
s 174
 
3.4%
n 81
 
1.6%
U 27
 
0.5%
k 27
 
0.5%
w 27
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5169
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 2256
43.6%
N 2229
43.1%
Y 174
 
3.4%
e 174
 
3.4%
s 174
 
3.4%
n 81
 
1.6%
U 27
 
0.5%
k 27
 
0.5%
w 27
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5169
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 2256
43.6%
N 2229
43.1%
Y 174
 
3.4%
e 174
 
3.4%
s 174
 
3.4%
n 81
 
1.6%
U 27
 
0.5%
k 27
 
0.5%
w 27
 
0.5%

SEER Combined Mets at DX-lung (2010+)
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
No
2393 
Unknown
 
30
Yes
 
7

Length

Max length7
Median length2
Mean length2.0646091
Min length2

Characters and Unicode

Total characters5017
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 2393
98.5%
Unknown 30
 
1.2%
Yes 7
 
0.3%

Length

2025-07-24T16:17:21.295033image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:21.389935image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
no 2393
98.5%
unknown 30
 
1.2%
yes 7
 
0.3%

Most occurring characters

ValueCountFrequency (%)
o 2423
48.3%
N 2393
47.7%
n 90
 
1.8%
U 30
 
0.6%
k 30
 
0.6%
w 30
 
0.6%
Y 7
 
0.1%
e 7
 
0.1%
s 7
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5017
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 2423
48.3%
N 2393
47.7%
n 90
 
1.8%
U 30
 
0.6%
k 30
 
0.6%
w 30
 
0.6%
Y 7
 
0.1%
e 7
 
0.1%
s 7
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5017
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 2423
48.3%
N 2393
47.7%
n 90
 
1.8%
U 30
 
0.6%
k 30
 
0.6%
w 30
 
0.6%
Y 7
 
0.1%
e 7
 
0.1%
s 7
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5017
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 2423
48.3%
N 2393
47.7%
n 90
 
1.8%
U 30
 
0.6%
k 30
 
0.6%
w 30
 
0.6%
Y 7
 
0.1%
e 7
 
0.1%
s 7
 
0.1%

Mets at DX-Distant LN (2016+)
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
None; no lymph node metastases
2391 
Unknown
 
29
Yes; distant lymph node metastases
 
10

Length

Max length34
Median length30
Mean length29.741975
Min length7

Characters and Unicode

Total characters72273
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone; no lymph node metastases
2nd rowNone; no lymph node metastases
3rd rowNone; no lymph node metastases
4th rowNone; no lymph node metastases
5th rowNone; no lymph node metastases

Common Values

ValueCountFrequency (%)
None; no lymph node metastases 2391
98.4%
Unknown 29
 
1.2%
Yes; distant lymph node metastases 10
 
0.4%

Length

2025-07-24T16:17:21.494683image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:21.594387image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
node 2401
20.0%
lymph 2401
20.0%
metastases 2401
20.0%
no 2391
19.9%
none 2391
19.9%
unknown 29
 
0.2%
yes 10
 
0.1%
distant 10
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 9604
13.3%
9604
13.3%
n 7280
10.1%
s 7223
10.0%
o 7212
10.0%
t 4822
 
6.7%
a 4812
 
6.7%
m 4802
 
6.6%
d 2411
 
3.3%
h 2401
 
3.3%
Other values (10) 12102
16.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 72273
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 9604
13.3%
9604
13.3%
n 7280
10.1%
s 7223
10.0%
o 7212
10.0%
t 4822
 
6.7%
a 4812
 
6.7%
m 4802
 
6.6%
d 2411
 
3.3%
h 2401
 
3.3%
Other values (10) 12102
16.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 72273
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 9604
13.3%
9604
13.3%
n 7280
10.1%
s 7223
10.0%
o 7212
10.0%
t 4822
 
6.7%
a 4812
 
6.7%
m 4802
 
6.6%
d 2411
 
3.3%
h 2401
 
3.3%
Other values (10) 12102
16.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 72273
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 9604
13.3%
9604
13.3%
n 7280
10.1%
s 7223
10.0%
o 7212
10.0%
t 4822
 
6.7%
a 4812
 
6.7%
m 4802
 
6.6%
d 2411
 
3.3%
h 2401
 
3.3%
Other values (10) 12102
16.7%

Mets at DX-Other (2016+)
Categorical

High correlation  Imbalance 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
None; no other metastases
2268 
Yes; distant mets in known site(s) other than bone, brain, liver, lung, dist LN
 
115
Unknown
 
29
generalized metastases such as carinomatosis
 
18

Length

Max length79
Median length25
Mean length27.481481
Min length7

Characters and Unicode

Total characters66780
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone; no other metastases
2nd rowNone; no other metastases
3rd rowNone; no other metastases
4th rowNone; no other metastases
5th rowNone; no other metastases

Common Values

ValueCountFrequency (%)
None; no other metastases 2268
93.3%
Yes; distant mets in known site(s) other than bone, brain, liver, lung, dist LN 115
 
4.7%
Unknown 29
 
1.2%
generalized metastases such as carinomatosis 18
 
0.7%

Length

2025-07-24T16:17:21.713104image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:21.815827image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
other 2383
22.1%
metastases 2286
21.2%
no 2268
21.0%
none 2268
21.0%
yes 115
 
1.1%
distant 115
 
1.1%
mets 115
 
1.1%
in 115
 
1.1%
known 115
 
1.1%
site(s 115
 
1.1%
Other values (12) 906
 
8.4%

Most occurring characters

ValueCountFrequency (%)
e 9852
14.8%
8371
12.5%
t 7663
11.5%
s 7620
11.4%
o 7214
10.8%
n 5579
8.4%
a 4989
7.5%
r 2649
 
4.0%
h 2516
 
3.8%
m 2419
 
3.6%
Other values (19) 7908
11.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 66780
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 9852
14.8%
8371
12.5%
t 7663
11.5%
s 7620
11.4%
o 7214
10.8%
n 5579
8.4%
a 4989
7.5%
r 2649
 
4.0%
h 2516
 
3.8%
m 2419
 
3.6%
Other values (19) 7908
11.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 66780
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 9852
14.8%
8371
12.5%
t 7663
11.5%
s 7620
11.4%
o 7214
10.8%
n 5579
8.4%
a 4989
7.5%
r 2649
 
4.0%
h 2516
 
3.8%
m 2419
 
3.6%
Other values (19) 7908
11.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 66780
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 9852
14.8%
8371
12.5%
t 7663
11.5%
s 7620
11.4%
o 7214
10.8%
n 5579
8.4%
a 4989
7.5%
r 2649
 
4.0%
h 2516
 
3.8%
m 2419
 
3.6%
Other values (19) 7908
11.8%

COD to site recode
Categorical

High correlation  Imbalance 

Distinct38
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Alive
2163 
In situ, benign or unknown behavior neoplasm
 
57
Other Cause of Death
 
38
Soft Tissue including Heart
 
33
Diseases of Heart
 
25
Other values (33)
 
114

Length

Max length55
Median length5
Mean length7.3222222
Min length5

Characters and Unicode

Total characters17793
Distinct characters50
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)0.5%

Sample

1st rowAlive
2nd rowAlive
3rd rowPancreas
4th rowAlive
5th rowAlive

Common Values

ValueCountFrequency (%)
Alive 2163
89.0%
In situ, benign or unknown behavior neoplasm 57
 
2.3%
Other Cause of Death 38
 
1.6%
Soft Tissue including Heart 33
 
1.4%
Diseases of Heart 25
 
1.0%
Stomach 17
 
0.7%
Miscellaneous Malignant Cancer 11
 
0.5%
Esophagus 8
 
0.3%
Cerebrovascular Diseases 7
 
0.3%
State DC not available or state DC available but no COD 7
 
0.3%
Other values (28) 64
 
2.6%

Length

2025-07-24T16:17:21.946476image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
alive 2163
66.5%
or 64
 
2.0%
of 63
 
1.9%
heart 60
 
1.8%
situ 57
 
1.8%
unknown 57
 
1.8%
in 57
 
1.8%
behavior 57
 
1.8%
neoplasm 57
 
1.8%
benign 57
 
1.8%
Other values (81) 562
 
17.3%

Most occurring characters

ValueCountFrequency (%)
e 2808
15.8%
i 2590
14.6%
l 2371
13.3%
v 2261
12.7%
A 2176
12.2%
824
 
4.6%
n 640
 
3.6%
a 503
 
2.8%
o 442
 
2.5%
s 436
 
2.5%
Other values (40) 2742
15.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 17793
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 2808
15.8%
i 2590
14.6%
l 2371
13.3%
v 2261
12.7%
A 2176
12.2%
824
 
4.6%
n 640
 
3.6%
a 503
 
2.8%
o 442
 
2.5%
s 436
 
2.5%
Other values (40) 2742
15.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 17793
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 2808
15.8%
i 2590
14.6%
l 2371
13.3%
v 2261
12.7%
A 2176
12.2%
824
 
4.6%
n 640
 
3.6%
a 503
 
2.8%
o 442
 
2.5%
s 436
 
2.5%
Other values (40) 2742
15.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 17793
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 2808
15.8%
i 2590
14.6%
l 2371
13.3%
v 2261
12.7%
A 2176
12.2%
824
 
4.6%
n 640
 
3.6%
a 503
 
2.8%
o 442
 
2.5%
s 436
 
2.5%
Other values (40) 2742
15.4%

SEER cause-specific death classification
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Alive or dead of other cause
2297 
Dead (attributable to this cancer dx)
 
126
Dead (missing/unknown COD)
 
7

Length

Max length37
Median length28
Mean length28.460905
Min length26

Characters and Unicode

Total characters69160
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlive or dead of other cause
2nd rowAlive or dead of other cause
3rd rowAlive or dead of other cause
4th rowAlive or dead of other cause
5th rowAlive or dead of other cause

Common Values

ValueCountFrequency (%)
Alive or dead of other cause 2297
94.5%
Dead (attributable to this cancer dx) 126
 
5.2%
Dead (missing/unknown COD) 7
 
0.3%

Length

2025-07-24T16:17:22.060172image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:22.155922image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
dead 2430
16.7%
alive 2297
15.8%
or 2297
15.8%
of 2297
15.8%
other 2297
15.8%
cause 2297
15.8%
attributable 126
 
0.9%
to 126
 
0.9%
this 126
 
0.9%
cancer 126
 
0.9%
Other values (3) 140
 
1.0%

Most occurring characters

ValueCountFrequency (%)
12129
17.5%
e 9573
13.8%
o 7024
10.2%
a 5105
 
7.4%
d 4853
 
7.0%
r 4846
 
7.0%
t 2927
 
4.2%
i 2563
 
3.7%
c 2549
 
3.7%
s 2437
 
3.5%
Other values (19) 15154
21.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 69160
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
12129
17.5%
e 9573
13.8%
o 7024
10.2%
a 5105
 
7.4%
d 4853
 
7.0%
r 4846
 
7.0%
t 2927
 
4.2%
i 2563
 
3.7%
c 2549
 
3.7%
s 2437
 
3.5%
Other values (19) 15154
21.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 69160
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
12129
17.5%
e 9573
13.8%
o 7024
10.2%
a 5105
 
7.4%
d 4853
 
7.0%
r 4846
 
7.0%
t 2927
 
4.2%
i 2563
 
3.7%
c 2549
 
3.7%
s 2437
 
3.5%
Other values (19) 15154
21.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 69160
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
12129
17.5%
e 9573
13.8%
o 7024
10.2%
a 5105
 
7.4%
d 4853
 
7.0%
r 4846
 
7.0%
t 2927
 
4.2%
i 2563
 
3.7%
c 2549
 
3.7%
s 2437
 
3.5%
Other values (19) 15154
21.9%

SEER other cause of death classification
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Alive or dead due to cancer
2289 
Dead (attributable to causes other than this cancer dx)
 
134
Dead (missing/unknown COD)
 
7

Length

Max length55
Median length27
Mean length28.541152
Min length26

Characters and Unicode

Total characters69355
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlive or dead due to cancer
2nd rowAlive or dead due to cancer
3rd rowDead (attributable to causes other than this cancer dx)
4th rowAlive or dead due to cancer
5th rowAlive or dead due to cancer

Common Values

ValueCountFrequency (%)
Alive or dead due to cancer 2289
94.2%
Dead (attributable to causes other than this cancer dx) 134
 
5.5%
Dead (missing/unknown COD) 7
 
0.3%

Length

2025-07-24T16:17:22.265623image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:22.361367image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
dead 2430
16.2%
to 2423
16.2%
cancer 2423
16.2%
alive 2289
15.3%
or 2289
15.3%
due 2289
15.3%
attributable 134
 
0.9%
causes 134
 
0.9%
other 134
 
0.9%
than 134
 
0.9%
Other values (4) 282
 
1.9%

Most occurring characters

ValueCountFrequency (%)
12531
18.1%
e 9833
14.2%
d 7142
10.3%
a 5389
7.8%
r 4980
 
7.2%
c 4980
 
7.2%
o 4853
 
7.0%
t 3227
 
4.7%
n 2585
 
3.7%
i 2571
 
3.7%
Other values (18) 11264
16.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 69355
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
12531
18.1%
e 9833
14.2%
d 7142
10.3%
a 5389
7.8%
r 4980
 
7.2%
c 4980
 
7.2%
o 4853
 
7.0%
t 3227
 
4.7%
n 2585
 
3.7%
i 2571
 
3.7%
Other values (18) 11264
16.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 69355
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
12531
18.1%
e 9833
14.2%
d 7142
10.3%
a 5389
7.8%
r 4980
 
7.2%
c 4980
 
7.2%
o 4853
 
7.0%
t 3227
 
4.7%
n 2585
 
3.7%
i 2571
 
3.7%
Other values (18) 11264
16.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 69355
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
12531
18.1%
e 9833
14.2%
d 7142
10.3%
a 5389
7.8%
r 4980
 
7.2%
c 4980
 
7.2%
o 4853
 
7.0%
t 3227
 
4.7%
n 2585
 
3.7%
i 2571
 
3.7%
Other values (18) 11264
16.2%
Distinct61
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
2025-07-24T16:17:22.542821image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length7
Median length4
Mean length4.0049383
Min length4

Characters and Unicode

Total characters9732
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0007
2nd row0025
3rd row0027
4th row0043
5th row0012
ValueCountFrequency (%)
0000 126
 
5.2%
0001 89
 
3.7%
0003 85
 
3.5%
0004 80
 
3.3%
0006 79
 
3.3%
0002 77
 
3.2%
0005 76
 
3.1%
0009 76
 
3.1%
0010 74
 
3.0%
0008 73
 
3.0%
Other values (51) 1595
65.6%
2025-07-24T16:17:22.954692image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 6014
61.8%
1 920
 
9.5%
2 593
 
6.1%
3 467
 
4.8%
4 449
 
4.6%
5 389
 
4.0%
6 238
 
2.4%
9 238
 
2.4%
8 199
 
2.0%
7 197
 
2.0%
Other values (5) 28
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 9732
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 6014
61.8%
1 920
 
9.5%
2 593
 
6.1%
3 467
 
4.8%
4 449
 
4.6%
5 389
 
4.0%
6 238
 
2.4%
9 238
 
2.4%
8 199
 
2.0%
7 197
 
2.0%
Other values (5) 28
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 9732
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 6014
61.8%
1 920
 
9.5%
2 593
 
6.1%
3 467
 
4.8%
4 449
 
4.6%
5 389
 
4.0%
6 238
 
2.4%
9 238
 
2.4%
8 199
 
2.0%
7 197
 
2.0%
Other values (5) 28
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 9732
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 6014
61.8%
1 920
 
9.5%
2 593
 
6.1%
3 467
 
4.8%
4 449
 
4.6%
5 389
 
4.0%
6 238
 
2.4%
9 238
 
2.4%
8 199
 
2.0%
7 197
 
2.0%
Other values (5) 28
 
0.3%

Survival months flag
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Complete dates are available and there are more than 0 days of survival
2379 
Complete dates are available and there are 0 days of survival
 
37
Incomplete dates are available and there cannot be zero days of follow-up
 
9
Not calculated because a Death Certificate Only or Autopsy Only case
 
4
Incomplete dates are available and there could be zero days of follow-up
 
1

Length

Max length73
Median length71
Mean length70.850617
Min length61

Characters and Unicode

Total characters172167
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowComplete dates are available and there are more than 0 days of survival
2nd rowComplete dates are available and there are more than 0 days of survival
3rd rowComplete dates are available and there are more than 0 days of survival
4th rowComplete dates are available and there are more than 0 days of survival
5th rowComplete dates are available and there are more than 0 days of survival

Common Values

ValueCountFrequency (%)
Complete dates are available and there are more than 0 days of survival 2379
97.9%
Complete dates are available and there are 0 days of survival 37
 
1.5%
Incomplete dates are available and there cannot be zero days of follow-up 9
 
0.4%
Not calculated because a Death Certificate Only or Autopsy Only case 4
 
0.2%
Incomplete dates are available and there could be zero days of follow-up 1
 
< 0.1%

Length

2025-07-24T16:17:23.100302image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:23.207047image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
are 4842
15.4%
dates 2426
7.7%
available 2426
7.7%
and 2426
7.7%
of 2426
7.7%
there 2426
7.7%
days 2426
7.7%
complete 2416
7.7%
0 2416
7.7%
survival 2416
7.7%
Other values (18) 4852
15.4%

Most occurring characters

ValueCountFrequency (%)
29068
16.9%
a 24230
14.1%
e 21825
12.7%
r 12081
 
7.0%
l 9731
 
5.7%
t 9690
 
5.6%
d 7283
 
4.2%
o 7283
 
4.2%
s 7280
 
4.2%
v 7258
 
4.2%
Other values (20) 36438
21.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 172167
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
29068
16.9%
a 24230
14.1%
e 21825
12.7%
r 12081
 
7.0%
l 9731
 
5.7%
t 9690
 
5.6%
d 7283
 
4.2%
o 7283
 
4.2%
s 7280
 
4.2%
v 7258
 
4.2%
Other values (20) 36438
21.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 172167
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
29068
16.9%
a 24230
14.1%
e 21825
12.7%
r 12081
 
7.0%
l 9731
 
5.7%
t 9690
 
5.6%
d 7283
 
4.2%
o 7283
 
4.2%
s 7280
 
4.2%
v 7258
 
4.2%
Other values (20) 36438
21.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 172167
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
29068
16.9%
a 24230
14.1%
e 21825
12.7%
r 12081
 
7.0%
l 9731
 
5.7%
t 9690
 
5.6%
d 7283
 
4.2%
o 7283
 
4.2%
s 7280
 
4.2%
v 7258
 
4.2%
Other values (20) 36438
21.2%

COD to site rec KM
Categorical

High correlation  Imbalance 

Distinct38
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Alive
2163 
In situ, benign or unknown behavior neoplasm
 
57
Other Cause of Death
 
38
Soft Tissue including Heart
 
33
Diseases of Heart
 
25
Other values (33)
 
114

Length

Max length55
Median length5
Mean length7.3222222
Min length5

Characters and Unicode

Total characters17793
Distinct characters50
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)0.5%

Sample

1st rowAlive
2nd rowAlive
3rd rowPancreas
4th rowAlive
5th rowAlive

Common Values

ValueCountFrequency (%)
Alive 2163
89.0%
In situ, benign or unknown behavior neoplasm 57
 
2.3%
Other Cause of Death 38
 
1.6%
Soft Tissue including Heart 33
 
1.4%
Diseases of Heart 25
 
1.0%
Stomach 17
 
0.7%
Miscellaneous Malignant Cancer 11
 
0.5%
Esophagus 8
 
0.3%
Cerebrovascular Diseases 7
 
0.3%
State DC not available or state DC available but no COD 7
 
0.3%
Other values (28) 64
 
2.6%

Length

2025-07-24T16:17:23.350652image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
alive 2163
66.5%
or 64
 
2.0%
of 63
 
1.9%
heart 60
 
1.8%
situ 57
 
1.8%
unknown 57
 
1.8%
in 57
 
1.8%
behavior 57
 
1.8%
neoplasm 57
 
1.8%
benign 57
 
1.8%
Other values (81) 562
 
17.3%

Most occurring characters

ValueCountFrequency (%)
e 2808
15.8%
i 2590
14.6%
l 2371
13.3%
v 2261
12.7%
A 2176
12.2%
824
 
4.6%
n 640
 
3.6%
a 503
 
2.8%
o 442
 
2.5%
s 436
 
2.5%
Other values (40) 2742
15.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 17793
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 2808
15.8%
i 2590
14.6%
l 2371
13.3%
v 2261
12.7%
A 2176
12.2%
824
 
4.6%
n 640
 
3.6%
a 503
 
2.8%
o 442
 
2.5%
s 436
 
2.5%
Other values (40) 2742
15.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 17793
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 2808
15.8%
i 2590
14.6%
l 2371
13.3%
v 2261
12.7%
A 2176
12.2%
824
 
4.6%
n 640
 
3.6%
a 503
 
2.8%
o 442
 
2.5%
s 436
 
2.5%
Other values (40) 2742
15.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 17793
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 2808
15.8%
i 2590
14.6%
l 2371
13.3%
v 2261
12.7%
A 2176
12.2%
824
 
4.6%
n 640
 
3.6%
a 503
 
2.8%
o 442
 
2.5%
s 436
 
2.5%
Other values (40) 2742
15.4%

COD to site recode ICD-O-3 2023 Revision
Categorical

High correlation  Imbalance 

Distinct41
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Alive
2163 
Benign and Borderline: All Other sites
 
56
Soft Tissue
 
33
Other COD
 
30
Stomach
 
17
Other values (36)
 
131

Length

Max length78
Median length5
Mean length7.1547325
Min length5

Characters and Unicode

Total characters17386
Distinct characters56
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)0.5%

Sample

1st rowAlive
2nd rowAlive
3rd rowPancreas
4th rowAlive
5th rowAlive

Common Values

ValueCountFrequency (%)
Alive 2163
89.0%
Benign and Borderline: All Other sites 56
 
2.3%
Soft Tissue 33
 
1.4%
Other COD 30
 
1.2%
Stomach 17
 
0.7%
Other and unspecified disorders of the circulatory system 16
 
0.7%
Miscellaneous Neoplasms 11
 
0.5%
Ischemic heart disease 8
 
0.3%
Esophagus 8
 
0.3%
Cerebrovascular diseases 7
 
0.3%
Other values (31) 81
 
3.3%

Length

2025-07-24T16:17:23.489281image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
alive 2163
68.7%
other 108
 
3.4%
and 103
 
3.3%
borderline 57
 
1.8%
benign 57
 
1.8%
all 56
 
1.8%
sites 56
 
1.8%
cod 37
 
1.2%
soft 33
 
1.0%
tissue 33
 
1.0%
Other values (91) 446
 
14.2%

Most occurring characters

ValueCountFrequency (%)
e 2868
16.5%
i 2586
14.9%
l 2479
14.3%
A 2247
12.9%
v 2206
12.7%
719
 
4.1%
s 463
 
2.7%
n 424
 
2.4%
r 396
 
2.3%
t 369
 
2.1%
Other values (46) 2629
15.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 17386
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 2868
16.5%
i 2586
14.9%
l 2479
14.3%
A 2247
12.9%
v 2206
12.7%
719
 
4.1%
s 463
 
2.7%
n 424
 
2.4%
r 396
 
2.3%
t 369
 
2.1%
Other values (46) 2629
15.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 17386
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 2868
16.5%
i 2586
14.9%
l 2479
14.3%
A 2247
12.9%
v 2206
12.7%
719
 
4.1%
s 463
 
2.7%
n 424
 
2.4%
r 396
 
2.3%
t 369
 
2.1%
Other values (46) 2629
15.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 17386
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 2868
16.5%
i 2586
14.9%
l 2479
14.3%
A 2247
12.9%
v 2206
12.7%
719
 
4.1%
s 463
 
2.7%
n 424
 
2.4%
r 396
 
2.3%
t 369
 
2.1%
Other values (46) 2629
15.1%

COD to site recode ICD-O-3 2023 Revision Expanded (1999+)
Categorical

High correlation  Imbalance 

Distinct42
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Alive
2163 
Benign and Borderline: All Other sites
 
56
Soft Tissue
 
33
Other COD
 
30
Stomach
 
17
Other values (37)
 
131

Length

Max length78
Median length5
Mean length7.1695473
Min length5

Characters and Unicode

Total characters17422
Distinct characters57
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)0.5%

Sample

1st rowAlive
2nd rowAlive
3rd rowPancreas
4th rowAlive
5th rowAlive

Common Values

ValueCountFrequency (%)
Alive 2163
89.0%
Benign and Borderline: All Other sites 56
 
2.3%
Soft Tissue 33
 
1.4%
Other COD 30
 
1.2%
Stomach 17
 
0.7%
Other and unspecified disorders of the circulatory system 16
 
0.7%
Esophagus 8
 
0.3%
Ischemic heart disease 8
 
0.3%
Miscellaneous Neoplasms 8
 
0.3%
Cerebrovascular diseases 7
 
0.3%
Other values (32) 84
 
3.5%

Length

2025-07-24T16:17:23.636911image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
alive 2163
68.6%
other 109
 
3.5%
and 102
 
3.2%
benign 57
 
1.8%
borderline 57
 
1.8%
all 56
 
1.8%
sites 56
 
1.8%
cod 37
 
1.2%
soft 33
 
1.0%
tissue 33
 
1.0%
Other values (93) 450
 
14.3%

Most occurring characters

ValueCountFrequency (%)
e 2874
16.5%
i 2589
14.9%
l 2482
14.2%
A 2247
12.9%
v 2205
12.7%
723
 
4.1%
s 460
 
2.6%
n 420
 
2.4%
r 396
 
2.3%
t 375
 
2.2%
Other values (47) 2651
15.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 17422
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 2874
16.5%
i 2589
14.9%
l 2482
14.2%
A 2247
12.9%
v 2205
12.7%
723
 
4.1%
s 460
 
2.6%
n 420
 
2.4%
r 396
 
2.3%
t 375
 
2.2%
Other values (47) 2651
15.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 17422
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 2874
16.5%
i 2589
14.9%
l 2482
14.2%
A 2247
12.9%
v 2205
12.7%
723
 
4.1%
s 460
 
2.6%
n 420
 
2.4%
r 396
 
2.3%
t 375
 
2.2%
Other values (47) 2651
15.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 17422
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 2874
16.5%
i 2589
14.9%
l 2482
14.2%
A 2247
12.9%
v 2205
12.7%
723
 
4.1%
s 460
 
2.6%
n 420
 
2.4%
r 396
 
2.3%
t 375
 
2.2%
Other values (47) 2651
15.2%

Vital status recode (study cutoff used)
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Alive
2163 
Dead
267 

Length

Max length5
Median length5
Mean length4.8901235
Min length4

Characters and Unicode

Total characters11883
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlive
2nd rowAlive
3rd rowDead
4th rowAlive
5th rowAlive

Common Values

ValueCountFrequency (%)
Alive 2163
89.0%
Dead 267
 
11.0%

Length

2025-07-24T16:17:23.790487image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:23.890331image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
alive 2163
89.0%
dead 267
 
11.0%

Most occurring characters

ValueCountFrequency (%)
e 2430
20.4%
A 2163
18.2%
i 2163
18.2%
l 2163
18.2%
v 2163
18.2%
D 267
 
2.2%
a 267
 
2.2%
d 267
 
2.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11883
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 2430
20.4%
A 2163
18.2%
i 2163
18.2%
l 2163
18.2%
v 2163
18.2%
D 267
 
2.2%
a 267
 
2.2%
d 267
 
2.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11883
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 2430
20.4%
A 2163
18.2%
i 2163
18.2%
l 2163
18.2%
v 2163
18.2%
D 267
 
2.2%
a 267
 
2.2%
d 267
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11883
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 2430
20.4%
A 2163
18.2%
i 2163
18.2%
l 2163
18.2%
v 2163
18.2%
D 267
 
2.2%
a 267
 
2.2%
d 267
 
2.2%

Sequence number
Categorical

High correlation  Imbalance 

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
One primary only
1784 
2nd of 2 or more primaries
417 
1st of 2 or more primaries
 
116
3rd of 3 or more primaries
 
90
4th of 4 or more primaries
 
22

Length

Max length26
Median length16
Mean length18.658436
Min length16

Characters and Unicode

Total characters45340
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowOne primary only
2nd rowOne primary only
3rd row2nd of 2 or more primaries
4th row2nd of 2 or more primaries
5th row2nd of 2 or more primaries

Common Values

ValueCountFrequency (%)
One primary only 1784
73.4%
2nd of 2 or more primaries 417
 
17.2%
1st of 2 or more primaries 116
 
4.8%
3rd of 3 or more primaries 90
 
3.7%
4th of 4 or more primaries 22
 
0.9%
5th of 5 or more primaries 1
 
< 0.1%

Length

2025-07-24T16:17:23.999040image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:24.111770image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
one 1784
19.3%
primary 1784
19.3%
only 1784
19.3%
of 646
 
7.0%
more 646
 
7.0%
or 646
 
7.0%
primaries 646
 
7.0%
2 533
 
5.8%
2nd 417
 
4.5%
1st 116
 
1.3%
Other values (6) 226
 
2.4%

Most occurring characters

ValueCountFrequency (%)
6798
15.0%
r 6242
13.8%
n 3985
8.8%
o 3722
8.2%
y 3568
7.9%
e 3076
6.8%
m 3076
6.8%
i 3076
6.8%
a 2430
 
5.4%
p 2430
 
5.4%
Other values (12) 6937
15.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 45340
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
6798
15.0%
r 6242
13.8%
n 3985
8.8%
o 3722
8.2%
y 3568
7.9%
e 3076
6.8%
m 3076
6.8%
i 3076
6.8%
a 2430
 
5.4%
p 2430
 
5.4%
Other values (12) 6937
15.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 45340
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
6798
15.0%
r 6242
13.8%
n 3985
8.8%
o 3722
8.2%
y 3568
7.9%
e 3076
6.8%
m 3076
6.8%
i 3076
6.8%
a 2430
 
5.4%
p 2430
 
5.4%
Other values (12) 6937
15.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 45340
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
6798
15.0%
r 6242
13.8%
n 3985
8.8%
o 3722
8.2%
y 3568
7.9%
e 3076
6.8%
m 3076
6.8%
i 3076
6.8%
a 2430
 
5.4%
p 2430
 
5.4%
Other values (12) 6937
15.3%

First malignant primary indicator
Boolean

High correlation 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size21.4 KiB
True
1931 
False
499 
ValueCountFrequency (%)
True 1931
79.5%
False 499
 
20.5%
2025-07-24T16:17:24.222475image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size21.4 KiB
True
2421 
False
 
9
ValueCountFrequency (%)
True 2421
99.6%
False 9
 
0.4%
2025-07-24T16:17:24.302877image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Record number recode
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
1
1878 
2
447 
3
 
79
4
 
24
5
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2430
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 1878
77.3%
2 447
 
18.4%
3 79
 
3.3%
4 24
 
1.0%
5 2
 
0.1%

Length

2025-07-24T16:17:24.398129image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:24.495835image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1 1878
77.3%
2 447
 
18.4%
3 79
 
3.3%
4 24
 
1.0%
5 2
 
0.1%

Most occurring characters

ValueCountFrequency (%)
1 1878
77.3%
2 447
 
18.4%
3 79
 
3.3%
4 24
 
1.0%
5 2
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 1878
77.3%
2 447
 
18.4%
3 79
 
3.3%
4 24
 
1.0%
5 2
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 1878
77.3%
2 447
 
18.4%
3 79
 
3.3%
4 24
 
1.0%
5 2
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 1878
77.3%
2 447
 
18.4%
3 79
 
3.3%
4 24
 
1.0%
5 2
 
0.1%

Total number of in situ/malignant tumors for patient
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
1
1809 
2
477 
3
 
114
4
 
26
5
 
4

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2430
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 1809
74.4%
2 477
 
19.6%
3 114
 
4.7%
4 26
 
1.1%
5 4
 
0.2%

Length

2025-07-24T16:17:24.607569image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:24.706303image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1 1809
74.4%
2 477
 
19.6%
3 114
 
4.7%
4 26
 
1.1%
5 4
 
0.2%

Most occurring characters

ValueCountFrequency (%)
1 1809
74.4%
2 477
 
19.6%
3 114
 
4.7%
4 26
 
1.1%
5 4
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 1809
74.4%
2 477
 
19.6%
3 114
 
4.7%
4 26
 
1.1%
5 4
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 1809
74.4%
2 477
 
19.6%
3 114
 
4.7%
4 26
 
1.1%
5 4
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 1809
74.4%
2 477
 
19.6%
3 114
 
4.7%
4 26
 
1.1%
5 4
 
0.2%
Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
0
2390 
1
 
38
3
 
1
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2430
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2390
98.4%
1 38
 
1.6%
3 1
 
< 0.1%
2 1
 
< 0.1%

Length

2025-07-24T16:17:24.858865image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:24.980539image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0 2390
98.4%
1 38
 
1.6%
3 1
 
< 0.1%
2 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2390
98.4%
1 38
 
1.6%
3 1
 
< 0.1%
2 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2390
98.4%
1 38
 
1.6%
3 1
 
< 0.1%
2 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2390
98.4%
1 38
 
1.6%
3 1
 
< 0.1%
2 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2430
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2390
98.4%
1 38
 
1.6%
3 1
 
< 0.1%
2 1
 
< 0.1%
Distinct72
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean65.072016
Minimum12
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.0 KiB
2025-07-24T16:17:25.127182image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum12
5-th percentile41
Q157
median66
Q374
95-th percentile85
Maximum90
Range78
Interquartile range (IQR)17

Descriptive statistics

Standard deviation13.345527
Coefficient of variation (CV)0.20508857
Kurtosis0.13248504
Mean65.072016
Median Absolute Deviation (MAD)9
Skewness-0.50278075
Sum158125
Variance178.10309
MonotonicityNot monotonic
2025-07-24T16:17:25.285724image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
70 86
 
3.5%
72 85
 
3.5%
71 83
 
3.4%
69 81
 
3.3%
67 79
 
3.3%
65 74
 
3.0%
64 72
 
3.0%
62 71
 
2.9%
60 71
 
2.9%
74 69
 
2.8%
Other values (62) 1659
68.3%
ValueCountFrequency (%)
12 1
 
< 0.1%
14 1
 
< 0.1%
18 1
 
< 0.1%
22 1
 
< 0.1%
23 3
0.1%
24 2
0.1%
25 3
0.1%
26 2
0.1%
27 3
0.1%
28 3
0.1%
ValueCountFrequency (%)
90 42
1.7%
89 11
 
0.5%
88 20
0.8%
87 22
0.9%
86 25
1.0%
85 31
1.3%
84 37
1.5%
83 34
1.4%
82 27
1.1%
81 39
1.6%

Year of follow-up recode
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
2022
2211 
2021
 
100
2020
 
65
2019
 
39
2018
 
15

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters9720
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022
2nd row2022
3rd row2021
4th row2022
5th row2022

Common Values

ValueCountFrequency (%)
2022 2211
91.0%
2021 100
 
4.1%
2020 65
 
2.7%
2019 39
 
1.6%
2018 15
 
0.6%

Length

2025-07-24T16:17:25.425382image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:25.523504image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 2211
91.0%
2021 100
 
4.1%
2020 65
 
2.7%
2019 39
 
1.6%
2018 15
 
0.6%

Most occurring characters

ValueCountFrequency (%)
2 7017
72.2%
0 2495
 
25.7%
1 154
 
1.6%
9 39
 
0.4%
8 15
 
0.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 9720
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 7017
72.2%
0 2495
 
25.7%
1 154
 
1.6%
9 39
 
0.4%
8 15
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 9720
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 7017
72.2%
0 2495
 
25.7%
1 154
 
1.6%
9 39
 
0.4%
8 15
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 9720
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 7017
72.2%
0 2495
 
25.7%
1 154
 
1.6%
9 39
 
0.4%
8 15
 
0.2%

Patient ID
Real number (ℝ)

High correlation 

Distinct2424
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12075882
Minimum812
Maximum22445878
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size38.0 KiB
2025-07-24T16:17:25.645274image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum812
5-th percentile1527866.1
Q15966362.8
median11394506
Q317031363
95-th percentile22386542
Maximum22445878
Range22445066
Interquartile range (IQR)11065000

Descriptive statistics

Standard deviation7536067.2
Coefficient of variation (CV)0.62405936
Kurtosis-1.398399
Mean12075882
Median Absolute Deviation (MAD)5589215
Skewness-0.051652279
Sum2.9344393 × 1010
Variance5.679231 × 1013
MonotonicityIncreasing
2025-07-24T16:17:25.786864image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11363912 2
 
0.1%
1489143 2
 
0.1%
5978043 2
 
0.1%
6020990 2
 
0.1%
1522113 2
 
0.1%
11341792 2
 
0.1%
641443 1
 
< 0.1%
651626 1
 
< 0.1%
22424103 1
 
< 0.1%
851907 1
 
< 0.1%
Other values (2414) 2414
99.3%
ValueCountFrequency (%)
812 1
< 0.1%
19511 1
< 0.1%
200360 1
< 0.1%
259988 1
< 0.1%
511662 1
< 0.1%
544070 1
< 0.1%
641443 1
< 0.1%
651626 1
< 0.1%
654059 1
< 0.1%
686799 1
< 0.1%
ValueCountFrequency (%)
22445878 1
< 0.1%
22445847 1
< 0.1%
22443797 1
< 0.1%
22443698 1
< 0.1%
22443682 1
< 0.1%
22443676 1
< 0.1%
22443657 1
< 0.1%
22443599 1
< 0.1%
22442404 1
< 0.1%
22442286 1
< 0.1%

Type of Reporting Source
Categorical

High correlation  Imbalance 

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Hospital inpatient/outpatient or clinic
2235 
Laboratory only (hospital or private)
 
148
Other hospital outpatient unit or surgery center (2006+)
 
27
Radiation treatment or medical oncology center (2006+)
 
11
Physicians office/private medical practitioner (LMD)
 
5

Length

Max length56
Median length39
Mean length39.117284
Min length12

Characters and Unicode

Total characters95055
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHospital inpatient/outpatient or clinic
2nd rowHospital inpatient/outpatient or clinic
3rd rowHospital inpatient/outpatient or clinic
4th rowHospital inpatient/outpatient or clinic
5th rowHospital inpatient/outpatient or clinic

Common Values

ValueCountFrequency (%)
Hospital inpatient/outpatient or clinic 2235
92.0%
Laboratory only (hospital or private) 148
 
6.1%
Other hospital outpatient unit or surgery center (2006+) 27
 
1.1%
Radiation treatment or medical oncology center (2006+) 11
 
0.5%
Physicians office/private medical practitioner (LMD) 5
 
0.2%
Autopsy only 4
 
0.2%

Length

2025-07-24T16:17:26.098537image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:26.199268image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
or 2421
24.2%
hospital 2410
24.1%
inpatient/outpatient 2235
22.3%
clinic 2235
22.3%
only 152
 
1.5%
laboratory 148
 
1.5%
private 148
 
1.5%
center 38
 
0.4%
2006 38
 
0.4%
other 27
 
0.3%
Other values (12) 154
 
1.5%

Most occurring characters

ValueCountFrequency (%)
t 14117
14.9%
i 13855
14.6%
n 9227
9.7%
o 7599
8.0%
7576
8.0%
a 7415
7.8%
p 7069
7.4%
e 4828
 
5.1%
l 4824
 
5.1%
c 4550
 
4.8%
Other values (26) 13995
14.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 95055
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 14117
14.9%
i 13855
14.6%
n 9227
9.7%
o 7599
8.0%
7576
8.0%
a 7415
7.8%
p 7069
7.4%
e 4828
 
5.1%
l 4824
 
5.1%
c 4550
 
4.8%
Other values (26) 13995
14.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 95055
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 14117
14.9%
i 13855
14.6%
n 9227
9.7%
o 7599
8.0%
7576
8.0%
a 7415
7.8%
p 7069
7.4%
e 4828
 
5.1%
l 4824
 
5.1%
c 4550
 
4.8%
Other values (26) 13995
14.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 95055
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 14117
14.9%
i 13855
14.6%
n 9227
9.7%
o 7599
8.0%
7576
8.0%
a 7415
7.8%
p 7069
7.4%
e 4828
 
5.1%
l 4824
 
5.1%
c 4550
 
4.8%
Other values (26) 13995
14.7%
Distinct7
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Married (including common law)
1349 
Single (never married)
396 
Widowed
239 
Unknown
200 
Divorced
198 
Other values (2)
 
48

Length

Max length30
Median length30
Mean length22.514815
Min length7

Characters and Unicode

Total characters54711
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMarried (including common law)
2nd rowSingle (never married)
3rd rowUnknown
4th rowMarried (including common law)
5th rowSingle (never married)

Common Values

ValueCountFrequency (%)
Married (including common law) 1349
55.5%
Single (never married) 396
 
16.3%
Widowed 239
 
9.8%
Unknown 200
 
8.2%
Divorced 198
 
8.1%
Separated 26
 
1.1%
Unmarried or Domestic Partner 22
 
0.9%

Length

2025-07-24T16:17:26.342913image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:26.458604image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
married 1745
23.8%
including 1349
18.4%
common 1349
18.4%
law 1349
18.4%
single 396
 
5.4%
never 396
 
5.4%
widowed 239
 
3.3%
unknown 200
 
2.7%
divorced 198
 
2.7%
separated 26
 
0.4%
Other values (4) 88
 
1.2%

Most occurring characters

ValueCountFrequency (%)
n 5483
 
10.0%
i 5320
 
9.7%
4905
 
9.0%
r 4220
 
7.7%
d 3818
 
7.0%
e 3488
 
6.4%
o 3379
 
6.2%
a 3190
 
5.8%
m 3138
 
5.7%
l 3094
 
5.7%
Other values (17) 14676
26.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 54711
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 5483
 
10.0%
i 5320
 
9.7%
4905
 
9.0%
r 4220
 
7.7%
d 3818
 
7.0%
e 3488
 
6.4%
o 3379
 
6.2%
a 3190
 
5.8%
m 3138
 
5.7%
l 3094
 
5.7%
Other values (17) 14676
26.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 54711
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 5483
 
10.0%
i 5320
 
9.7%
4905
 
9.0%
r 4220
 
7.7%
d 3818
 
7.0%
e 3488
 
6.4%
o 3379
 
6.2%
a 3190
 
5.8%
m 3138
 
5.7%
l 3094
 
5.7%
Other values (17) 14676
26.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 54711
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 5483
 
10.0%
i 5320
 
9.7%
4905
 
9.0%
r 4220
 
7.7%
d 3818
 
7.0%
e 3488
 
6.4%
o 3379
 
6.2%
a 3190
 
5.8%
m 3138
 
5.7%
l 3094
 
5.7%
Other values (17) 14676
26.8%
Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
ANALYTIC abstract from facility WITH CoC accreditation
1802 
Abstract from facility WITHOUT CoC accreditation
449 
NON-ANALYTIC abstract from facility WITH CoC accreditation
 
179

Length

Max length58
Median length54
Mean length53.186008
Min length48

Characters and Unicode

Total characters129242
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowANALYTIC abstract from facility WITH CoC accreditation
2nd rowANALYTIC abstract from facility WITH CoC accreditation
3rd rowANALYTIC abstract from facility WITH CoC accreditation
4th rowAbstract from facility WITHOUT CoC accreditation
5th rowANALYTIC abstract from facility WITH CoC accreditation

Common Values

ValueCountFrequency (%)
ANALYTIC abstract from facility WITH CoC accreditation 1802
74.2%
Abstract from facility WITHOUT CoC accreditation 449
 
18.5%
NON-ANALYTIC abstract from facility WITH CoC accreditation 179
 
7.4%

Length

2025-07-24T16:17:26.597235image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:26.706941image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
abstract 2430
14.7%
accreditation 2430
14.7%
from 2430
14.7%
facility 2430
14.7%
coc 2430
14.7%
with 1981
12.0%
analytic 1802
10.9%
without 449
 
2.7%
non-analytic 179
 
1.1%

Most occurring characters

ValueCountFrequency (%)
14131
 
10.9%
t 12150
 
9.4%
a 11701
 
9.1%
c 9720
 
7.5%
i 9720
 
7.5%
o 7290
 
5.6%
r 7290
 
5.6%
C 6841
 
5.3%
T 4860
 
3.8%
f 4860
 
3.8%
Other values (18) 40679
31.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 129242
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
14131
 
10.9%
t 12150
 
9.4%
a 11701
 
9.1%
c 9720
 
7.5%
i 9720
 
7.5%
o 7290
 
5.6%
r 7290
 
5.6%
C 6841
 
5.3%
T 4860
 
3.8%
f 4860
 
3.8%
Other values (18) 40679
31.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 129242
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
14131
 
10.9%
t 12150
 
9.4%
a 11701
 
9.1%
c 9720
 
7.5%
i 9720
 
7.5%
o 7290
 
5.6%
r 7290
 
5.6%
C 6841
 
5.3%
T 4860
 
3.8%
f 4860
 
3.8%
Other values (18) 40679
31.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 129242
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
14131
 
10.9%
t 12150
 
9.4%
a 11701
 
9.1%
c 9720
 
7.5%
i 9720
 
7.5%
o 7290
 
5.6%
r 7290
 
5.6%
C 6841
 
5.3%
T 4860
 
3.8%
f 4860
 
3.8%
Other values (18) 40679
31.5%

Median household income inflation adj to 2023
Categorical

High correlation 

Distinct16
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
$120,000+
641 
$95,000 - $99,999
259 
$100,000 - $109,999
254 
$85,000 - $89,999
244 
$80,000 - $84,999
207 
Other values (11)
825 

Length

Max length19
Median length17
Mean length15.182716
Min length9

Characters and Unicode

Total characters36894
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row$120,000+
2nd row$120,000+
3rd row$120,000+
4th row$120,000+
5th row$120,000+

Common Values

ValueCountFrequency (%)
$120,000+ 641
26.4%
$95,000 - $99,999 259
10.7%
$100,000 - $109,999 254
 
10.5%
$85,000 - $89,999 244
 
10.0%
$80,000 - $84,999 207
 
8.5%
$75,000 - $79,999 202
 
8.3%
$90,000 - $94,999 190
 
7.8%
$110,000 - $119,999 106
 
4.4%
$65,000 - $69,999 91
 
3.7%
$70,000 - $74,999 73
 
3.0%
Other values (6) 163
 
6.7%

Length

2025-07-24T16:17:26.827617image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1789
29.8%
120,000 641
 
10.7%
95,000 259
 
4.3%
99,999 259
 
4.3%
100,000 254
 
4.2%
109,999 254
 
4.2%
85,000 244
 
4.1%
89,999 244
 
4.1%
80,000 207
 
3.4%
84,999 207
 
3.4%
Other values (20) 1649
27.5%

Most occurring characters

ValueCountFrequency (%)
0 9364
25.4%
9 7486
20.3%
$ 4218
11.4%
, 4218
11.4%
3577
 
9.7%
- 1788
 
4.8%
1 1573
 
4.3%
5 1040
 
2.8%
8 902
 
2.4%
+ 641
 
1.7%
Other values (5) 2087
 
5.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 36894
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 9364
25.4%
9 7486
20.3%
$ 4218
11.4%
, 4218
11.4%
3577
 
9.7%
- 1788
 
4.8%
1 1573
 
4.3%
5 1040
 
2.8%
8 902
 
2.4%
+ 641
 
1.7%
Other values (5) 2087
 
5.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 36894
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 9364
25.4%
9 7486
20.3%
$ 4218
11.4%
, 4218
11.4%
3577
 
9.7%
- 1788
 
4.8%
1 1573
 
4.3%
5 1040
 
2.8%
8 902
 
2.4%
+ 641
 
1.7%
Other values (5) 2087
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 36894
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 9364
25.4%
9 7486
20.3%
$ 4218
11.4%
, 4218
11.4%
3577
 
9.7%
- 1788
 
4.8%
1 1573
 
4.3%
5 1040
 
2.8%
8 902
 
2.4%
+ 641
 
1.7%
Other values (5) 2087
 
5.7%
Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size38.0 KiB
Counties in metropolitan areas ge 1 million pop
1592 
Counties in metropolitan areas of 250,000 to 1 million pop
502 
Counties in metropolitan areas of lt 250 thousand pop
 
133
Nonmetropolitan counties adjacent to a metropolitan area
 
106
Nonmetropolitan counties not adjacent to a metropolitan area
 
97

Length

Max length60
Median length47
Mean length50.512346
Min length47

Characters and Unicode

Total characters122745
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCounties in metropolitan areas ge 1 million pop
2nd rowCounties in metropolitan areas ge 1 million pop
3rd rowCounties in metropolitan areas ge 1 million pop
4th rowCounties in metropolitan areas ge 1 million pop
5th rowCounties in metropolitan areas ge 1 million pop

Common Values

ValueCountFrequency (%)
Counties in metropolitan areas ge 1 million pop 1592
65.5%
Counties in metropolitan areas of 250,000 to 1 million pop 502
 
20.7%
Counties in metropolitan areas of lt 250 thousand pop 133
 
5.5%
Nonmetropolitan counties adjacent to a metropolitan area 106
 
4.4%
Nonmetropolitan counties not adjacent to a metropolitan area 97
 
4.0%

Length

2025-07-24T16:17:26.951491image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-24T16:17:27.065388image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
counties 2430
11.9%
metropolitan 2430
11.9%
in 2227
10.9%
areas 2227
10.9%
pop 2227
10.9%
million 2094
10.2%
1 2094
10.2%
ge 1592
7.8%
to 705
 
3.4%
of 635
 
3.1%
Other values (9) 1810
8.8%

Most occurring characters

ValueCountFrequency (%)
18041
14.7%
o 13790
11.2%
i 11478
9.4%
n 10020
8.2%
e 9288
 
7.6%
t 8967
 
7.3%
a 8235
 
6.7%
p 7087
 
5.8%
l 6954
 
5.7%
r 5063
 
4.1%
Other values (16) 23822
19.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 122745
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
18041
14.7%
o 13790
11.2%
i 11478
9.4%
n 10020
8.2%
e 9288
 
7.6%
t 8967
 
7.3%
a 8235
 
6.7%
p 7087
 
5.8%
l 6954
 
5.7%
r 5063
 
4.1%
Other values (16) 23822
19.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 122745
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
18041
14.7%
o 13790
11.2%
i 11478
9.4%
n 10020
8.2%
e 9288
 
7.6%
t 8967
 
7.3%
a 8235
 
6.7%
p 7087
 
5.8%
l 6954
 
5.7%
r 5063
 
4.1%
Other values (16) 23822
19.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 122745
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
18041
14.7%
o 13790
11.2%
i 11478
9.4%
n 10020
8.2%
e 9288
 
7.6%
t 8967
 
7.3%
a 8235
 
6.7%
p 7087
 
5.8%
l 6954
 
5.7%
r 5063
 
4.1%
Other values (16) 23822
19.4%

Interactions

2025-07-24T16:17:10.270367image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:04.208038image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:05.671041image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:09.040333image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:10.368093image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:04.342658image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:06.324745image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:09.152590image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:11.309585image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:05.417737image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:07.745314image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:10.058173image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:11.412822image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:05.558383image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:08.293893image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-24T16:17:10.168607image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2025-07-24T16:17:27.260346image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
AJCC ID (2018+)Age recode with single ages and 90+COD to site rec KMCOD to site recodeCOD to site recode ICD-O-3 2023 RevisionCOD to site recode ICD-O-3 2023 Revision Expanded (1999+)Chemotherapy recode (yes, no/unk)CoC Accredited Flag (2018+)Derived EOD 2018 M Recode (2018+)Derived EOD 2018 N Recode (2018+)Derived EOD 2018 Stage Group Recode (2018+)Derived EOD 2018 T Recode (2018+)Derived Summary Grade 2018 (2018+)Diagnostic ConfirmationEOD Mets Recode (2018+)EOD Primary Tumor Recode (2018+)EOD Regional Nodes Recode (2018+)First malignant primary indicatorGrade Clinical (2018+)Grade Pathological (2018+)Marital status at diagnosisMedian household income inflation adj to 2023Mets at DX-Distant LN (2016+)Mets at DX-Other (2016+)PRCDA 2020Patient IDPrimary SitePrimary Site - labeledPrimary by international rulesRX Summ--Scope Reg LN Sur (2003+)RX Summ--Surg Oth Reg/Dis (2003+)RX Summ--Surg Prim Site (1998+)RX Summ--Surg/Rad SeqRX Summ--Systemic/Sur Seq (2007+)Race recode (White, Black, Other)Radiation recodeReason no cancer-directed surgeryRecord number recodeRegional nodes examined (1988+)Regional nodes positive (1988+)Rural-Urban Continuum CodeSEER Combined Mets at DX-bone (2010+)SEER Combined Mets at DX-brain (2010+)SEER Combined Mets at DX-liver (2010+)SEER Combined Mets at DX-lung (2010+)SEER cause-specific death classificationSEER other cause of death classificationSequence numberSexSite recode ICD-O-3 2023 Revision ExpandedSurvival months flagTotal number of benign/borderline tumors for patientTotal number of in situ/malignant tumors for patientTumor Size Summary (2016+)Type of Reporting SourceVital status recode (study cutoff used)Year of diagnosisYear of follow-up recode
AJCC ID (2018+)1.0000.0750.1240.1240.1310.1320.1940.0920.7120.7070.8460.7130.1410.1080.1780.2800.1650.0120.0500.1380.0410.0760.1180.1650.0120.0920.6910.9910.0000.3490.0700.3910.0000.1650.0920.0000.1930.0000.2090.2080.0550.1350.1180.1740.1350.1140.0250.0130.0320.9920.0670.0290.0220.2600.0500.1350.0510.111
Age recode with single ages and 90+0.0751.0000.0530.0530.0520.0500.0860.0750.0440.1060.0580.0820.0410.0870.0000.0700.1150.1990.0160.0470.1560.0000.0000.0090.017-0.108-0.0900.0000.0240.0000.0200.0930.0620.0820.0640.1020.0860.1040.0710.1590.0000.0000.0000.0170.0000.1120.1270.0970.0950.0160.0000.0000.1000.0280.0000.2570.0040.094
COD to site rec KM0.1240.0531.0001.0000.9750.9750.0950.1060.2700.1420.1170.1140.1220.1050.3260.1430.1010.1730.0950.0520.0000.0000.3490.1620.0730.0000.1310.0650.0000.0000.0260.0000.0000.0000.0000.0000.1910.0750.0560.0000.0580.1450.2160.2130.1730.9370.9400.1040.0810.0660.2200.0000.1400.0000.1760.9930.1280.411
COD to site recode0.1240.0531.0001.0000.9750.9750.0950.1060.2700.1420.1170.1140.1220.1050.3260.1430.1010.1730.0950.0520.0000.0000.3490.1620.0730.0000.1310.0650.0000.0000.0260.0000.0000.0000.0000.0000.1910.0750.0560.0000.0580.1450.2160.2130.1730.9370.9400.1040.0810.0660.2200.0000.1400.0000.1760.9930.1280.411
COD to site recode ICD-O-3 2023 Revision0.1310.0520.9750.9751.0001.0000.0920.1090.2710.1490.1130.1350.1260.1190.3260.1590.1050.1740.0960.0480.0450.0000.3600.1640.0600.0000.1250.0550.0000.0000.0000.0000.0000.0000.0000.0000.1930.0700.0510.0000.0640.1580.2110.2110.3090.9370.9400.0990.0770.0600.2280.0000.1360.0000.1760.9920.1260.422
COD to site recode ICD-O-3 2023 Revision Expanded (1999+)0.1320.0500.9750.9751.0001.0000.0920.1090.2740.1500.1130.1350.1250.1170.3260.1580.1040.1730.0940.0450.0460.0000.3600.1710.0560.0000.1250.0510.0000.0000.0000.0000.0000.0000.0000.0000.1950.0720.0470.0000.0620.1570.2100.2100.3090.9370.9400.0980.0910.0580.2270.0000.1410.0000.1800.9920.1270.423
Chemotherapy recode (yes, no/unk)0.1940.0860.0950.0950.0920.0921.0000.1490.3210.1700.5720.5450.4100.0670.3150.3120.1700.0880.2000.4480.1360.0640.0780.2140.0560.0830.1140.2510.0000.1890.1290.2510.0280.7230.0340.0370.1680.0960.1360.1380.0160.0710.0590.2410.0650.0610.0000.0870.0590.2050.0920.0000.0790.5430.1740.0410.1890.020
CoC Accredited Flag (2018+)0.0920.0750.1060.1060.1090.1090.1491.0000.0940.0920.1610.2220.1420.0600.0160.2050.2550.0180.0420.1440.2640.1570.1050.0950.0590.2230.1270.1670.0520.3870.1370.2370.0000.1240.1460.0000.2600.0320.1560.1660.1070.1020.0910.0980.0970.0600.0320.0000.0100.1190.1870.0400.0000.2120.3830.0900.0890.074
Derived EOD 2018 M Recode (2018+)0.7120.0440.2700.2700.2710.2740.3210.0941.0000.7160.9570.7550.2170.1020.6710.3880.2100.0300.1040.2160.0510.0840.1690.4680.0000.1030.6860.7110.0000.3420.1970.3550.0000.0900.0140.0480.2790.0000.1500.1810.0630.1500.1210.5080.1490.2080.0530.0000.0860.7040.0710.0000.0010.3560.0610.2690.1380.187
Derived EOD 2018 N Recode (2018+)0.7070.1060.1420.1420.1490.1500.1700.0920.7161.0000.7770.7130.1420.0940.1920.2780.7020.0260.0370.1510.0440.0580.1250.1700.0000.0820.6830.7020.0170.3460.0890.2900.0000.0930.0100.0490.2040.0000.1880.4460.0460.1380.1320.1810.1350.1370.0640.0000.0320.6980.0700.0000.0220.3250.0560.1620.0450.125
Derived EOD 2018 Stage Group Recode (2018+)0.8460.0580.1170.1170.1130.1130.5720.1610.9570.7771.0000.5610.4740.1010.6150.3130.3120.0730.1700.4790.0910.0420.1650.3540.0710.0730.4000.4120.0400.2850.1150.2410.0000.2560.0480.0470.2110.0100.1350.1630.0360.1510.1260.4710.1500.2100.0730.0000.1200.3990.0640.0000.0000.3710.1000.2840.1150.129
Derived EOD 2018 T Recode (2018+)0.7130.0820.1140.1140.1350.1350.5450.2220.7550.7130.5611.0000.1960.0660.2960.5690.3140.0960.0660.2190.1010.0410.1670.2040.0560.0510.3940.4290.0570.3520.0920.2680.0000.2080.0370.0000.2120.0410.1660.1760.0370.2300.1550.2710.1740.1600.0490.0360.1120.4090.1070.0000.0340.7320.1670.2150.0920.100
Derived Summary Grade 2018 (2018+)0.1410.0410.1220.1220.1260.1250.4100.1420.2170.1420.4740.1961.0000.0930.1910.2040.1450.0570.6610.8580.0590.0180.0720.1230.0750.0280.0830.1620.0500.2160.0710.2350.0000.2990.0270.0000.2160.0460.2330.0980.0000.0750.0620.1650.0640.1420.0760.0420.0820.1080.0390.0000.0490.1950.0690.2140.0810.099
Diagnostic Confirmation0.1080.0870.1050.1050.1190.1170.0670.0600.1020.0940.1010.0660.0931.0000.0310.0650.0260.0610.0390.0850.0230.0000.0730.0880.0410.0250.0580.0680.0000.0000.0000.1040.0000.0000.0780.0000.1610.0000.0000.0000.0030.0710.1450.0850.0710.0770.0690.0150.0000.0590.2310.0000.0000.0000.0560.1080.0170.042
EOD Mets Recode (2018+)0.1780.0000.3260.3260.3260.3260.3150.0160.6710.1920.6150.2960.1910.0311.0000.3330.1580.0200.0960.1910.0000.0550.1920.4650.0160.0780.1760.2920.0000.0370.1740.2350.0000.0540.0140.0370.3280.0000.0000.0420.0610.1040.0510.5350.1020.2040.0480.0000.0940.1940.0000.0000.0000.4220.0000.2600.1350.172
EOD Primary Tumor Recode (2018+)0.2800.0700.1430.1430.1590.1580.3120.2050.3880.2780.3130.5690.2040.0650.3331.0000.3710.0510.0890.2000.1290.0500.1730.2260.0390.0570.2080.2380.0650.4120.1340.2770.0630.1290.0440.0190.2580.0080.2080.2170.0330.2580.1620.2950.1680.1620.0330.0000.0700.2180.1060.0000.0290.4630.1870.2090.0760.102
EOD Regional Nodes Recode (2018+)0.1650.1150.1010.1010.1050.1040.1700.2550.2100.7020.3120.3140.1450.0260.1580.3711.0000.0120.0100.1510.2110.0170.2080.1710.0280.0870.1330.1710.0560.4830.1540.2150.0000.0870.0690.0730.2270.0000.2290.4250.0370.2190.1830.2210.1910.1210.0770.0000.0310.1380.1640.0000.0380.3710.2950.1460.0260.093
First malignant primary indicator0.0120.1990.1730.1730.1740.1730.0880.0180.0300.0260.0730.0960.0570.0610.0200.0510.0121.0000.0460.0450.0410.0000.0240.0350.0000.0930.0210.0000.1100.0600.0000.1030.0000.1180.0750.0000.0360.8820.0510.0420.0000.0280.0330.0400.0390.0000.1380.9620.0000.0520.0450.0280.8690.0000.0560.0780.0000.040
Grade Clinical (2018+)0.0500.0160.0950.0950.0960.0940.2000.0420.1040.0370.1700.0660.6610.0390.0960.0890.0100.0461.0000.2840.0270.0270.0590.0600.0720.0290.0490.1810.0000.0000.0180.1010.0000.0710.0000.0000.0460.0960.4100.0000.0180.0410.0000.0510.0000.0820.0080.0920.0220.1250.0000.0000.0980.1300.0000.0860.0550.052
Grade Pathological (2018+)0.1380.0470.0520.0520.0480.0450.4480.1440.2160.1510.4790.2190.8580.0850.1910.2000.1510.0450.2841.0000.0580.0170.0740.1110.0490.0000.0800.1070.0500.2230.0850.3030.0000.3620.0180.0200.2910.0430.2440.1060.0050.0700.0530.1730.0650.1390.0670.0370.0770.0950.0210.0000.0430.2130.0620.2170.0660.076
Marital status at diagnosis0.0410.1560.0000.0000.0450.0460.1360.2640.0510.0440.0910.1010.0590.0230.0000.1290.2110.0410.0270.0581.0000.0240.1310.1040.0290.0680.0000.0430.0000.2470.0740.0690.1030.0390.1750.0500.0900.0280.1290.1200.0490.1160.1130.1110.1130.0210.0330.0120.2310.0150.1560.0230.0370.0000.2630.0750.0280.015
Median household income inflation adj to 20230.0760.0000.0000.0000.0000.0000.0640.1570.0840.0580.0420.0410.0180.0000.0550.0500.0170.0000.0270.0170.0241.0000.0160.0590.4360.5140.0420.0450.0000.1340.0240.0540.0000.0570.2960.0610.0410.0000.0520.0410.4780.0380.0650.0530.1270.0000.0000.0000.0000.0140.0000.0000.0150.0000.0650.0000.1260.025
Mets at DX-Distant LN (2016+)0.1180.0000.3490.3490.3600.3600.0780.1050.1690.1250.1650.1670.0720.0730.1920.1730.2080.0240.0590.0740.1310.0161.0000.6400.0150.0420.1350.1550.0000.2110.1620.2050.0000.0060.0790.0000.2330.0000.0630.1110.0000.6760.6490.6320.6910.1280.0180.0000.0050.1300.1250.0000.0000.0680.1690.1620.0550.109
Mets at DX-Other (2016+)0.1650.0090.1620.1620.1640.1710.2140.0950.4680.1700.3540.2040.1230.0880.4650.2260.1710.0350.0600.1110.1040.0590.6401.0000.0000.0610.1410.1590.0000.2050.1940.2090.0260.0730.0650.0000.1960.0000.0000.1110.0260.6470.6260.5930.6510.1770.0170.0000.0620.1470.1010.0000.0000.2860.1210.2040.0730.113
PRCDA 20200.0120.0170.0730.0730.0600.0560.0560.0590.0000.0000.0710.0560.0750.0410.0160.0390.0280.0000.0720.0490.0290.4360.0150.0001.0000.8740.0070.0800.0000.0700.0340.0840.0000.0820.3190.0000.0530.0000.0540.0730.4320.0160.0000.0000.0000.0000.0220.0000.0000.0000.0510.0000.0210.0670.0200.0000.1490.000
Patient ID0.092-0.1080.0000.0000.0000.0000.0830.2230.1030.0820.0730.0510.0280.0250.0780.0570.0870.0930.0290.0000.0680.5140.0420.0610.8741.000-0.0040.0580.0190.1220.0570.0490.0000.0460.3650.0460.0770.1680.0000.0510.3920.0420.0760.0850.0540.0250.0000.0920.0200.0480.0720.0000.0900.0030.0580.0000.0840.000
Primary Site0.691-0.0900.1310.1310.1250.1250.1140.1270.6860.6830.4000.3940.0830.0580.1760.2080.1330.0210.0490.0800.0000.0420.1350.1410.007-0.0041.0000.9920.0000.2430.0420.4140.0000.0000.0000.0000.1220.0680.1710.1820.0520.1440.1230.1930.1460.1350.0000.0000.0200.9330.0390.0000.0000.2020.0300.1550.0470.137
Primary Site - labeled0.9910.0000.0650.0650.0550.0510.2510.1670.7110.7020.4120.4290.1620.0680.2920.2380.1710.0000.1810.1070.0430.0450.1550.1590.0800.0580.9921.0000.0000.1870.0000.2610.0000.0910.1240.0820.1480.0610.1520.1720.0700.1180.3000.2150.1690.1820.0940.0000.0800.9940.0000.0000.0730.1150.0000.1850.0750.198
Primary by international rules0.0000.0240.0000.0000.0000.0000.0000.0520.0000.0170.0400.0570.0500.0000.0000.0650.0560.1100.0000.0500.0000.0000.0000.0000.0000.0190.0000.0001.0000.0000.0620.0000.0000.0000.0000.0000.1030.1120.0000.0000.0000.0000.0000.0220.0000.0000.0000.1170.0000.0000.0000.0000.1530.0000.0870.0000.0000.000
RX Summ--Scope Reg LN Sur (2003+)0.3490.0000.0000.0000.0000.0000.1890.3870.3420.3460.2850.3520.2160.0000.0370.4120.4830.0600.0000.2230.2470.1340.2110.2050.0700.1220.2430.1870.0001.0000.1970.4410.0000.1590.0940.0000.6040.0210.7690.4870.0000.1910.2620.1860.2620.0000.0000.0100.1070.2190.3320.3300.0430.2080.3280.0800.0370.089
RX Summ--Surg Oth Reg/Dis (2003+)0.0700.0200.0260.0260.0000.0000.1290.1370.1970.0890.1150.0920.0710.0000.1740.1340.1540.0000.0180.0850.0740.0240.1620.1940.0340.0570.0420.0000.0620.1971.0000.3510.0500.1170.0950.0000.3260.0000.1430.0990.0340.1620.1530.1840.1590.1130.1190.0000.0000.0150.0220.0000.0000.3370.1430.0580.0350.042
RX Summ--Surg Prim Site (1998+)0.3910.0930.0000.0000.0000.0000.2510.2370.3550.2900.2410.2680.2350.1040.2350.2770.2150.1030.1010.3030.0690.0540.2050.2090.0840.0490.4140.2610.0000.4410.3511.0000.5360.1870.0910.1690.5100.1080.2100.2630.0890.1920.1750.2790.1900.1800.0810.1180.0710.3230.0000.0000.0660.1550.1260.2750.0590.133
RX Summ--Surg/Rad Seq0.0000.0620.0000.0000.0000.0000.0280.0000.0000.0000.0000.0000.0000.0000.0000.0630.0000.0000.0000.0000.1030.0000.0000.0260.0000.0000.0000.0000.0000.0000.0500.5361.0000.0810.0360.4250.0000.0930.0000.0000.0000.0000.0000.0000.0000.0000.0000.0960.0000.0000.0000.0000.0890.0000.0000.0000.0210.000
RX Summ--Systemic/Sur Seq (2007+)0.1650.0820.0000.0000.0000.0000.7230.1240.0900.0930.2560.2080.2990.0000.0540.1290.0870.1180.0710.3620.0390.0570.0060.0730.0820.0460.0000.0910.0000.1590.1170.1870.0811.0000.0470.0350.1220.0480.2220.1370.0700.0000.0000.0000.0000.0000.0000.0440.0240.0950.0000.0000.0520.2990.0450.0710.0620.046
Race recode (White, Black, Other)0.0920.0640.0000.0000.0000.0000.0340.1460.0140.0100.0480.0370.0270.0780.0140.0440.0690.0750.0000.0180.1750.2960.0790.0650.3190.3650.0000.1240.0000.0940.0950.0910.0360.0471.0000.0380.0640.0360.0770.0820.2090.0750.0920.0850.0780.0300.0340.0410.0590.0840.0870.0000.0470.0000.1300.0140.0450.000
Radiation recode0.0000.1020.0000.0000.0000.0000.0370.0000.0480.0490.0470.0000.0000.0000.0370.0190.0730.0000.0000.0200.0500.0610.0000.0000.0000.0460.0000.0820.0000.0000.0000.1690.4250.0350.0381.0000.0000.0000.0790.2850.0000.0000.1460.0240.0000.0000.0000.0000.0000.0290.0000.0270.0000.2120.0000.0160.0000.000
Reason no cancer-directed surgery0.1930.0860.1910.1910.1930.1950.1680.2600.2790.2040.2110.2120.2160.1610.3280.2580.2270.0360.0460.2910.0900.0410.2330.1960.0530.0770.1220.1480.1030.6040.3260.5100.0000.1220.0640.0001.0000.0000.1010.1530.0290.2070.1940.2900.2090.2440.1640.0000.0660.1320.1310.0950.0000.1250.1960.3090.0420.143
Record number recode0.0000.1040.0750.0750.0700.0720.0960.0320.0000.0000.0100.0410.0460.0000.0000.0080.0000.8820.0960.0430.0280.0000.0000.0000.0000.1680.0680.0610.1120.0210.0000.1080.0930.0480.0360.0000.0001.0000.1280.0560.0000.0000.0000.0000.0000.0000.0770.8150.0640.0470.0000.1500.6600.0000.0000.0600.0110.000
Regional nodes examined (1988+)0.2090.0710.0560.0560.0510.0470.1360.1560.1500.1880.1350.1660.2330.0000.0000.2080.2290.0510.4100.2440.1290.0520.0630.0000.0540.0000.1710.1520.0000.7690.1430.2100.0000.2220.0770.0790.1010.1281.0000.5930.0000.0590.0000.0000.0000.0000.0000.1490.0780.1630.0990.0000.1040.2610.1430.0380.0430.032
Regional nodes positive (1988+)0.2080.1590.0000.0000.0000.0000.1380.1660.1810.4460.1630.1760.0980.0000.0420.2170.4250.0420.0000.1060.1200.0410.1110.1110.0730.0510.1820.1720.0000.4870.0990.2630.0000.1370.0820.2850.1530.0560.5931.0000.0000.1000.0900.1170.0960.0880.0580.0430.0510.1740.1460.0110.0390.2060.1740.0750.0450.103
Rural-Urban Continuum Code0.0550.0000.0580.0580.0640.0620.0160.1070.0630.0460.0360.0370.0000.0030.0610.0330.0370.0000.0180.0050.0490.4780.0000.0260.4320.3920.0520.0700.0000.0000.0340.0890.0000.0700.2090.0000.0290.0000.0000.0001.0000.0160.0000.0260.0340.0210.0000.0000.0480.0570.0140.0000.0000.0800.0240.0210.0420.000
SEER Combined Mets at DX-bone (2010+)0.1350.0000.1450.1450.1580.1570.0710.1020.1500.1380.1510.2300.0750.0710.1040.2580.2190.0280.0410.0700.1160.0380.6760.6470.0160.0420.1440.1180.0000.1910.1620.1920.0000.0000.0750.0000.2070.0000.0590.1000.0161.0000.6610.6280.6830.0600.0580.0000.0000.1390.1220.0000.0100.0000.1660.1090.0910.052
SEER Combined Mets at DX-brain (2010+)0.1180.0000.2160.2160.2110.2100.0590.0910.1210.1320.1260.1550.0620.1450.0510.1620.1830.0330.0000.0530.1130.0650.6490.6260.0000.0760.1230.3000.0000.2620.1530.1750.0000.0000.0920.1460.1940.0000.0000.0900.0000.6611.0000.6010.6840.0680.0370.0000.0000.1260.1180.0000.0000.0000.1580.1160.0400.037
SEER Combined Mets at DX-liver (2010+)0.1740.0170.2130.2130.2110.2100.2410.0980.5080.1810.4710.2710.1650.0850.5350.2950.2210.0400.0510.1730.1110.0530.6320.5930.0000.0850.1930.2150.0220.1860.1840.2790.0000.0000.0850.0240.2900.0000.0000.1170.0260.6280.6011.0000.6230.1480.0320.0000.0610.1940.1320.0000.0000.2400.1570.1940.1180.133
SEER Combined Mets at DX-lung (2010+)0.1350.0000.1730.1730.3090.3090.0650.0970.1490.1350.1500.1740.0640.0710.1020.1680.1910.0390.0000.0650.1130.1270.6910.6510.0000.0540.1460.1690.0000.2620.1590.1900.0000.0000.0780.0000.2090.0000.0000.0960.0340.6830.6840.6231.0000.0870.0000.0000.0000.1710.1220.0200.0100.0000.1660.1090.0450.083
SEER cause-specific death classification0.1140.1120.9370.9370.9370.9370.0610.0600.2080.1370.2100.1600.1420.0770.2040.1620.1210.0000.0820.1390.0210.0000.1280.1770.0000.0250.1350.1820.0000.0000.1130.1800.0000.0000.0300.0000.2440.0000.0000.0880.0210.0600.0680.1480.0871.0000.7080.0000.0430.1500.1320.0000.0000.1070.0230.6840.1130.352
SEER other cause of death classification0.0250.1270.9400.9400.9400.9400.0000.0320.0530.0640.0730.0490.0760.0690.0480.0330.0770.1380.0080.0670.0330.0000.0180.0170.0220.0000.0000.0940.0000.0000.1190.0810.0000.0000.0340.0000.1640.0770.0000.0580.0000.0580.0370.0320.0000.7081.0000.0970.0380.0600.1670.0000.1020.0000.1160.7060.0950.324
Sequence number0.0130.0970.1040.1040.0990.0980.0870.0000.0000.0000.0000.0360.0420.0150.0000.0000.0000.9620.0920.0370.0120.0000.0000.0000.0000.0920.0000.0000.1170.0100.0000.1180.0960.0440.0410.0000.0000.8150.1490.0430.0000.0000.0000.0000.0000.0000.0971.0000.0710.0000.0000.0630.8130.0000.0000.0750.0490.010
Sex0.0320.0950.0810.0810.0770.0910.0590.0100.0860.0320.1200.1120.0820.0000.0940.0700.0310.0000.0220.0770.2310.0000.0050.0620.0000.0200.0200.0800.0000.1070.0000.0710.0000.0240.0590.0000.0660.0640.0780.0510.0480.0000.0000.0610.0000.0430.0380.0711.0000.0530.0220.0000.0570.1390.0280.0670.0310.000
Site recode ICD-O-3 2023 Revision Expanded0.9920.0160.0660.0660.0600.0580.2050.1190.7040.6980.3990.4090.1080.0590.1940.2180.1380.0520.1250.0950.0150.0140.1300.1470.0000.0480.9330.9940.0000.2190.0150.3230.0000.0950.0840.0290.1320.0470.1630.1740.0570.1390.1260.1940.1710.1500.0600.0000.0531.0000.0330.0000.0000.1090.0000.1600.0380.131
Survival months flag0.0670.0000.2200.2200.2280.2270.0920.1870.0710.0700.0640.1070.0390.2310.0000.1060.1640.0450.0000.0210.1560.0000.1250.1010.0510.0720.0390.0000.0000.3320.0220.0000.0000.0000.0870.0000.1310.0000.0990.1460.0140.1220.1180.1320.1220.1320.1670.0000.0220.0331.0000.0200.0000.0000.5410.1270.0320.095
Total number of benign/borderline tumors for patient0.0290.0000.0000.0000.0000.0000.0000.0400.0000.0000.0000.0000.0000.0000.0000.0000.0000.0280.0000.0000.0230.0000.0000.0000.0000.0000.0000.0000.0000.3300.0000.0000.0000.0000.0000.0270.0950.1500.0000.0110.0000.0000.0000.0000.0200.0000.0000.0630.0000.0000.0201.0000.0330.0000.0110.0000.0000.000
Total number of in situ/malignant tumors for patient0.0220.1000.1400.1400.1360.1410.0790.0000.0010.0220.0000.0340.0490.0000.0000.0290.0380.8690.0980.0430.0370.0150.0000.0000.0210.0900.0000.0730.1530.0430.0000.0660.0890.0520.0470.0000.0000.6600.1040.0390.0000.0100.0000.0000.0100.0000.1020.8130.0570.0000.0000.0331.0000.0000.0070.0790.0140.000
Tumor Size Summary (2016+)0.2600.0280.0000.0000.0000.0000.5430.2120.3560.3250.3710.7320.1950.0000.4220.4630.3710.0000.1300.2130.0000.0000.0680.2860.0670.0030.2020.1150.0000.2080.3370.1550.0000.2990.0000.2120.1250.0000.2610.2060.0800.0000.0000.2400.0000.1070.0000.0000.1390.1090.0000.0000.0001.0000.1330.2200.0880.078
Type of Reporting Source0.0500.0000.1760.1760.1760.1800.1740.3830.0610.0560.1000.1670.0690.0560.0000.1870.2950.0560.0000.0620.2630.0650.1690.1210.0200.0580.0300.0000.0870.3280.1430.1260.0000.0450.1300.0000.1960.0000.1430.1740.0240.1660.1580.1570.1660.0230.1160.0000.0280.0000.5410.0110.0070.1331.0000.1140.0390.028
Vital status recode (study cutoff used)0.1350.2570.9930.9930.9920.9920.0410.0900.2690.1620.2840.2150.2140.1080.2600.2090.1460.0780.0860.2170.0750.0000.1620.2040.0000.0000.1550.1850.0000.0800.0580.2750.0000.0710.0140.0160.3090.0600.0380.0750.0210.1090.1160.1940.1090.6840.7060.0750.0670.1600.1270.0000.0790.2200.1141.0000.2180.658
Year of diagnosis0.0510.0040.1280.1280.1260.1270.1890.0890.1380.0450.1150.0920.0810.0170.1350.0760.0260.0000.0550.0660.0280.1260.0550.0730.1490.0840.0470.0750.0000.0370.0350.0590.0210.0620.0450.0000.0420.0110.0430.0450.0420.0910.0400.1180.0450.1130.0950.0490.0310.0380.0320.0000.0140.0880.0390.2181.0000.208
Year of follow-up recode0.1110.0940.4110.4110.4220.4230.0200.0740.1870.1250.1290.1000.0990.0420.1720.1020.0930.0400.0520.0760.0150.0250.1090.1130.0000.0000.1370.1980.0000.0890.0420.1330.0000.0460.0000.0000.1430.0000.0320.1030.0000.0520.0370.1330.0830.3520.3240.0100.0000.1310.0950.0000.0000.0780.0280.6580.2081.000

Missing values

2025-07-24T16:17:11.652497image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-24T16:17:12.105254image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Race recode (White, Black, Other)SexYear of diagnosisPRCDA 2020Site recode ICD-O-3 2023 Revision ExpandedPrimary Site - labeledPrimary SiteDerived Summary Grade 2018 (2018+)Grade Clinical (2018+)Grade Pathological (2018+)Diagnostic ConfirmationAJCC ID (2018+)Derived EOD 2018 T Recode (2018+)Derived EOD 2018 N Recode (2018+)Derived EOD 2018 M Recode (2018+)Derived EOD 2018 Stage Group Recode (2018+)RX Summ--Surg Prim Site (1998+)RX Summ--Scope Reg LN Sur (2003+)RX Summ--Surg Oth Reg/Dis (2003+)RX Summ--Surg/Rad SeqReason no cancer-directed surgeryRadiation recodeChemotherapy recode (yes, no/unk)RX Summ--Systemic/Sur Seq (2007+)Time from diagnosis to treatment in days recodeEOD Primary Tumor Recode (2018+)EOD Regional Nodes Recode (2018+)EOD Mets Recode (2018+)Tumor Size Over Time Recode (1988+)Tumor Size Summary (2016+)Regional nodes examined (1988+)Regional nodes positive (1988+)SEER Combined Mets at DX-bone (2010+)SEER Combined Mets at DX-brain (2010+)SEER Combined Mets at DX-liver (2010+)SEER Combined Mets at DX-lung (2010+)Mets at DX-Distant LN (2016+)Mets at DX-Other (2016+)COD to site recodeSEER cause-specific death classificationSEER other cause of death classificationSurvival monthsSurvival months flagCOD to site rec KMCOD to site recode ICD-O-3 2023 RevisionCOD to site recode ICD-O-3 2023 Revision Expanded (1999+)Vital status recode (study cutoff used)Sequence numberFirst malignant primary indicatorPrimary by international rulesRecord number recodeTotal number of in situ/malignant tumors for patientTotal number of benign/borderline tumors for patientAge recode with single ages and 90+Year of follow-up recodePatient IDType of Reporting SourceMarital status at diagnosisCoC Accredited Flag (2018+)Median household income inflation adj to 2023Rural-Urban Continuum Code
0WhiteFemale2022Not PRCDAStomachC16.3-Gastric antrum163L9LPositive histologyGIST: Gastric and OmentalT2N0M01A30NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures045100000000280280098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0007Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes210672022812Hospital inpatient/outpatient or clinicMarried (including common law)ANALYTIC abstract from facility WITH CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
4BlackFemale2020Not PRCDASmall IntestineC17.1-Jejunum171L9LPositive histologyGIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and PeritonealT2N0M01304 or more regional lymph nodes removedNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures000100000000230230700NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0025Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes21077202219511Hospital inpatient/outpatient or clinicSingle (never married)ANALYTIC abstract from facility WITH CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
15BlackFemale2018Not PRCDASmall IntestineC17.9-Small intestine, NOS179999Positive histologyGIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and PeritonealT2N0M099301 to 3 regional lymph nodes removedNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures000100000000350350200NoNoNoNoNone; no lymph node metastasesNone; no other metastasesPancreasAlive or dead of other causeDead (attributable to causes other than this cancer dx)0027Complete dates are available and there are more than 0 days of survivalPancreasPancreasPancreasDead2nd of 2 or more primariesNoYes220862021200360Hospital inpatient/outpatient or clinicUnknownANALYTIC abstract from facility WITH CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
19WhiteFemale2019Not PRCDAStomachC16.6-Greater curvature of stomach NOS166L9LPositive histologyGIST: Gastric and OmentalT2N0M01A30NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures025100000000220220098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0043Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAlive2nd of 2 or more primariesNoYes220702022259988Hospital inpatient/outpatient or clinicMarried (including common law)Abstract from facility WITHOUT CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
42WhiteFemale2021Not PRCDAStomachC16.1-Fundus of stomach161999Positive histologyGIST: Gastric and OmentalT2N0M09930NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures002100000000230230098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0012Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAlive2nd of 2 or more primariesYesYes220772022511662Hospital inpatient/outpatient or clinicSingle (never married)ANALYTIC abstract from facility WITH CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
46WhiteMale2019Not PRCDAStomachC16.9-Stomach, NOS169H9HPositive histologyGIST: Gastric and OmentalT4N0M14304 or more regional lymph nodes removedNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures000700000702502502100NoNoNoNoNone; no lymph node metastasesYes; distant mets in known site(s) other than bone, brain, liver, lung, dist LNAliveAlive or dead of other causeAlive or dead due to cancer0040Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAlive2nd of 2 or more primariesNoYes220792022544070Hospital inpatient/outpatient or clinicMarried (including common law)ANALYTIC abstract from facility WITH CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
63WhiteMale2022Not PRCDASmall IntestineC17.1-Jejunum171L9LPositive histologyGIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and PeritonealT4N0M03A304 or more regional lymph nodes removedNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownYesSystemic therapy after surgery055400000001301300700NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0006Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAlive2nd of 2 or more primariesNoYes220732022641443Hospital inpatient/outpatient or clinicMarried (including common law)ANALYTIC abstract from facility WITH CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
65UnknownFemale2021Not PRCDAStomachC16.9-Stomach, NOS169999Positive histologyGIST: Gastric and OmentalTXN0M09900Unknown or not applicableNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenNot recommendedNone/UnknownNo/UnknownNo systemic therapy and/or surgical proceduresUnable to calculate99999900Unknown or size unreasonable (includes any tumor sizes 401-989)9999999NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0018Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes210502022651626Laboratory only (hospital or private)UnknownAbstract from facility WITHOUT CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
66BlackMale2019Not PRCDAStomachC16.9-Stomach, NOS169999Positive histologyGIST: Gastric and OmentalT3N0M09930NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures000400000000940940098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesMiscellaneous Malignant CancerAlive or dead of other causeDead (attributable to causes other than this cancer dx)0035Complete dates are available and there are more than 0 days of survivalMiscellaneous Malignant CancerMiscellaneous NeoplasmsMiscellaneous NeoplasmsDead2nd of 2 or more primariesNoYes220822022654059Hospital inpatient/outpatient or clinicUnknownANALYTIC abstract from facility WITH CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
82WhiteMale2019Not PRCDAStomachC16.9-Stomach, NOS169LLLPositive histologyGIST: Gastric and OmentalT3N0M01B301 to 3 regional lymph nodes removedNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures028100000000530530300NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0037Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAlive2nd of 2 or more primariesNoYes220772022686799Hospital inpatient/outpatient or clinicMarried (including common law)ANALYTIC abstract from facility WITH CoC accreditation$120,000+Counties in metropolitan areas ge 1 million pop
Race recode (White, Black, Other)SexYear of diagnosisPRCDA 2020Site recode ICD-O-3 2023 Revision ExpandedPrimary Site - labeledPrimary SiteDerived Summary Grade 2018 (2018+)Grade Clinical (2018+)Grade Pathological (2018+)Diagnostic ConfirmationAJCC ID (2018+)Derived EOD 2018 T Recode (2018+)Derived EOD 2018 N Recode (2018+)Derived EOD 2018 M Recode (2018+)Derived EOD 2018 Stage Group Recode (2018+)RX Summ--Surg Prim Site (1998+)RX Summ--Scope Reg LN Sur (2003+)RX Summ--Surg Oth Reg/Dis (2003+)RX Summ--Surg/Rad SeqReason no cancer-directed surgeryRadiation recodeChemotherapy recode (yes, no/unk)RX Summ--Systemic/Sur Seq (2007+)Time from diagnosis to treatment in days recodeEOD Primary Tumor Recode (2018+)EOD Regional Nodes Recode (2018+)EOD Mets Recode (2018+)Tumor Size Over Time Recode (1988+)Tumor Size Summary (2016+)Regional nodes examined (1988+)Regional nodes positive (1988+)SEER Combined Mets at DX-bone (2010+)SEER Combined Mets at DX-brain (2010+)SEER Combined Mets at DX-liver (2010+)SEER Combined Mets at DX-lung (2010+)Mets at DX-Distant LN (2016+)Mets at DX-Other (2016+)COD to site recodeSEER cause-specific death classificationSEER other cause of death classificationSurvival monthsSurvival months flagCOD to site rec KMCOD to site recode ICD-O-3 2023 RevisionCOD to site recode ICD-O-3 2023 Revision Expanded (1999+)Vital status recode (study cutoff used)Sequence numberFirst malignant primary indicatorPrimary by international rulesRecord number recodeTotal number of in situ/malignant tumors for patientTotal number of benign/borderline tumors for patientAge recode with single ages and 90+Year of follow-up recodePatient IDType of Reporting SourceMarital status at diagnosisCoC Accredited Flag (2018+)Median household income inflation adj to 2023Rural-Urban Continuum Code
6091BlackMale2022Not PRCDAStomachC16.6-Greater curvature of stomach NOS166L9LPositive histologyGIST: Gastric and OmentalT3N0M01B32NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures000100000000900900098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0005Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11078202222442286Hospital inpatient/outpatient or clinicWidowedANALYTIC abstract from facility WITH CoC accreditation$75,000 - $79,999Counties in metropolitan areas ge 1 million pop
6092BlackFemale2022Not PRCDAStomachC16.9-Stomach, NOS169L9LPositive histologyGIST: Gastric and OmentalT4N0M0233NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownYesSystemic therapy after surgery000100000001031030098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0005Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11058202222442404Hospital inpatient/outpatient or clinicMarried (including common law)ANALYTIC abstract from facility WITH CoC accreditation$75,000 - $79,999Counties in metropolitan areas ge 1 million pop
6093BlackFemale2022Not PRCDAColon And Rectum (Excluding Appendix)C18.7-Sigmoid colon187H9HPositive histologyGIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and PeritonealT3N0M03B404 or more regional lymph nodes removedNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownYesSystemic therapy after surgery000100000000720721600NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0001Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11067202222443599Hospital inpatient/outpatient or clinicSingle (never married)ANALYTIC abstract from facility WITH CoC accreditation$75,000 - $79,999Counties in metropolitan areas ge 1 million pop
6094BlackFemale2022Not PRCDAStomachC16.9-Stomach, NOS169L9LPositive histologyGIST: Gastric and OmentalT2N0M01A30NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures079100000000380380098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0002Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11057202222443657Hospital inpatient/outpatient or clinicMarried (including common law)ANALYTIC abstract from facility WITH CoC accreditation$80,000 - $84,999Counties in metropolitan areas ge 1 million pop
6095BlackMale2022Not PRCDASmall IntestineC17.9-Small intestine, NOS179L9LPositive histologyGIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and PeritonealT2N0M0130NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures000100000000330330098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0007Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11051202222443676Hospital inpatient/outpatient or clinicSingle (never married)ANALYTIC abstract from facility WITH CoC accreditation$90,000 - $94,999Counties in metropolitan areas ge 1 million pop
6096BlackMale2022Not PRCDASmall IntestineC17.9-Small intestine, NOS179999Positive histologyGIST: Small Intestinal, Esophageal, Colorectal, Mesenteric, and PeritonealTXN0M09900NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenNot recommendedNone/UnknownNo/UnknownNo systemic therapy and/or surgical proceduresUnable to calculate10000000Unknown or size unreasonable (includes any tumor sizes 401-989)9990098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0000Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11036202222443682Hospital inpatient/outpatient or clinicSingle (never married)ANALYTIC abstract from facility WITH CoC accreditation$90,000 - $94,999Counties in metropolitan areas ge 1 million pop
6097BlackMale2022Not PRCDAStomachC16.2-Body of stomach162999Positive histologyGIST: Gastric and OmentalT4N0M09900NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenNot recommendedNone/UnknownYesNo systemic therapy and/or surgical procedures00810000000Unknown or size unreasonable (includes any tumor sizes 401-989)2100098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0003Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11060202222443698Hospital inpatient/outpatient or clinicMarried (including common law)ANALYTIC abstract from facility WITH CoC accreditation$75,000 - $79,999Counties in metropolitan areas ge 1 million pop
6098BlackMale2022Not PRCDAStomachC16.3-Gastric antrum163L9LPositive histologyGIST: Gastric and OmentalT4N0M02334 or more regional lymph nodes removedNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownYesSystemic therapy after surgery331400000001221221500NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0009Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11064202222443797Hospital inpatient/outpatient or clinicMarried (including common law)ANALYTIC abstract from facility WITH CoC accreditation$75,000 - $79,999Counties in metropolitan areas ge 1 million pop
6103UnknownMale2022Not PRCDAStomachC16.2-Body of stomach162LL9UnknownGIST: Gastric and OmentalT1N0M01A00NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenNot recommendedNone/UnknownNo/UnknownNo systemic therapy and/or surgical proceduresUnable to calculate100000000190190098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0004Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11087202222445847Hospital inpatient/outpatient or clinicWidowedANALYTIC abstract from facility WITH CoC accreditation$90,000 - $94,999Counties in metropolitan areas ge 1 million pop
6104BlackFemale2022Not PRCDAStomachC16.6-Greater curvature of stomach NOS166L9LPositive histologyGIST: Gastric and OmentalT2N0M01A30NaNNone; diagnosed at autopsyNo radiation and/or no surgery; unknown if surgery and/or radiation givenSurgery performedNone/UnknownNo/UnknownNo systemic therapy and/or surgical procedures106100000000410410098NoNoNoNoNone; no lymph node metastasesNone; no other metastasesAliveAlive or dead of other causeAlive or dead due to cancer0004Complete dates are available and there are more than 0 days of survivalAliveAliveAliveAliveOne primary onlyYesYes11058202222445878Hospital inpatient/outpatient or clinicDivorcedANALYTIC abstract from facility WITH CoC accreditation$75,000 - $79,999Counties in metropolitan areas ge 1 million pop